The Atlantic Best of The Atlantic publisher Technology The Shallowness of Google Translate The program uses state-of-the-art AI techniques, but simple tests show that it's a long way from real understanding. Douglas Hofstadter Jan 30, 2018 Hands hold a smartphone in front of a sign saying "Bienvenue" and the smartphone reads "Welcome" Nazar Abbas Photography / Getty One Sunday, at one of our weekly salsa sessions, my friend Frank brought along a Danish guest. I knew Frank spoke Danish well, since his mother was Danish, and he, as a child, had lived in Denmark. As for his friend, her English was fluent, as is standard for Scandinavians. However, to my surprise, during the evening’s chitchat it emerged that the two friends habitually exchanged emails using Google Translate. Frank would write a message in English, then run it through Google Translate to produce a new text in Danish; conversely, she would write a message in Danish, then let Google Translate anglicize it. How odd! Why would two intelligent people, each of whom spoke the other’s language well, do this? My own experiences with machine-translation software had always led me to be highly skeptical about it. But my skepticism was clearly not shared by these two. Indeed, many thoughtful people are quite enamored of translation programs, finding little to criticize in them. This baffles me. As a language lover and an impassioned translator, as a cognitive scientist and a lifelong admirer of the human mind’s subtlety, I have followed the attempts to mechanize translation for decades. When I first got interested in the subject, in the mid-1970s, I ran across a letter written in 1947 by the mathematician Warren Weaver, an early machine-translation advocate, to Norbert Wiener, a key figure in cybernetics, in which Weaver made this curious claim, today quite famous: When I look at an article in Russian, I say, “This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.” Some years later he offered a different viewpoint: “No reasonable person thinks that a machine translation can ever achieve elegance and style. Pushkin need not shudder.” Whew! Having devoted one unforgettably intense year of my life to translating Alexander Pushkin’s sparkling novel in verse Eugene Onegin into my native tongue (that is, having radically reworked that great Russian work into an English-language novel in verse), I find this remark of Weaver’s far more congenial than his earlier remark, which reveals a strangely simplistic view of language. Nonetheless, his 1947 view of translation-as-decoding became a credo that has long driven the field of machine translation. Since those days, “translation engines” have gradually improved, and recently the use of so-called “deep neural nets” has even suggested to some observers (see “The Great AI Awakening” by Gideon Lewis-Kraus in The New York Times Magazine, and “Machine Translation: Beyond Babel” by Lane Greene in The Economist) that human translators may be an endangered species. In this scenario, human translators would become, within a few years, mere quality controllers and glitch fixers, rather than producers of fresh new text. Such a development would cause a soul-shattering upheaval in my mental life. Although I fully understand the fascination of trying to get machines to translate well, I am not in the least eager to see human translators replaced by inanimate machines. Indeed, the idea frightens and revolts me. To my mind, translation is an incredibly subtle art that draws constantly on one’s many years of experience in life, and on one’s creative imagination. If, some “fine” day, human translators were to become relics of the past, my respect for the human mind would be profoundly shaken, and the shock would leave me reeling with terrible confusion and immense, permanent sadness. Each time I read an article claiming that the guild of human translators will soon be forced to bow down before the terrible swift sword of some new technology, I feel the need to check the claims out myself, partly out of a sense of terror that this nightmare just might be around the corner, more hopefully out of a desire to reassure myself that it’s not just around the corner, and finally, out of my longstanding belief that it’s important to combat exaggerated claims about artificial intelligence. And so, after reading about how the old idea of artificial neural networks, recently adopted by a branch of Google called Google Brain, and now enhanced by “deep learning,” has resulted in a new kind of software that has allegedly revolutionized machine translation, I decided I had to check out the latest incarnation of Google Translate. Was it a game changer, as Deep Blue and AlphaGo were for the venerable games of chess and Go? I learned that although the older version of Google Translate can handle a very large repertoire of languages, its new deep-learning incarnation at the time worked for just nine languages. (It’s now expanded to 96.) Accordingly, I limited my explorations to English, French, German, and Chinese. Before showing my findings, though, I should point out that an ambiguity in the adjective “deep” is being exploited here. When one hears that Google bought a company called DeepMind whose products have “deep neural networks” enhanced by “deep learning,” one cannot taking the word “deep” to mean “profound,” and thus “powerful,” “insightful,” “wise.” And yet, the meaning of “deep” in this context comes simply from the fact that these neural networks have more layers (12, say) than do older networks, which might have only two or three. But does that sort of depth imply that whatever such a network does must be profound? Hardly. This is verbal spinmeistery. I am very wary of Google Translate, especially given all the hype surrounding it. But despite my distaste, I recognize some astonishing facts about this bête noire of mine. It is accessible for free to anyone on earth, and will convert text in any of roughly 100 languages into text in any of the others. That is humbling. If I am proud to call myself “pi-lingual” (meaning the sum of all my fractional languages is a bit over 3, which is my lighthearted way of answering the question “How many languages do you speak?”), then how much prouder should Google Translate be, since it could call itself “bai-lingual” (“bai” being Mandarin for 100). To a mere pilingual, bailingualism is most impressive. Moreover, if I copy and paste a page of text in Language A into Google Translate, only moments will elapse before I get back a page filled with words in Language B. And this is happening all the time on screens all over the planet, in dozens of languages. The practical utility of Google Translate and similar technologies is undeniable, and probably it’s a good thing overall, but there is still something deeply lacking in the approach, which is conveyed by a single word: understanding. Machine translation has never focused on understanding language. Instead, the field has always tried to “decode”—to get away without worrying about what understanding and meaning are. Could it in fact be that understanding isn’t needed in order to translate well? Could an entity, human or machine, do high-quality translation without paying attention to what language is all about? To shed some light on this question, I turn now to the experiments I made. I began my explorations very humbly, using the following short remark, which, in a human mind, evokes a clear scenario: In their house, everything comes in pairs. There’s his car and her car, his towels and her towels, and his library and hers. The translation challenge seems straightforward, but in French (and other Romance languages), the words for “his” and “her” don’t agree in gender with the possessor, but with the item possessed. So here’s what Google Translate gave me: Dans leur maison, tout vient en paires. Il y a sa voituresa voiture, ses serviettesses serviettes, sa bibliothèqueles siennes. The program fell into my trap, not realizing, as any human reader would, that I was describing a couple, stressing that for each item he had, she had a similar one. For example, the deep-learning engine used the word “sa” for both “his car” and “her car,” so you can’t tell anything about either car-owner’s gender. Likewise, it used the genderless plural “ses” both for “his towels” and “her towels,” and in the last case of the two libraries, his and hers, it got thrown by the final “s” in “hers” and somehow decided that that “s” represented a plural (“les siennes”). Google Translate’s French sentence missed the whole point. Next I translated the challenge phrase into French myself, in a way that did preserve the intended meaning. Here’s my French version: Chez eux, ils ont tout en double. Il y a sa voitureellesa voiturelui, ses servietteselleses servietteslui, sa bibliothèqueellesa bibliothèquelui. The phrase “sa voitureelle” spells out the idea “her car,” and similarly, “sa voiturelui” can only be heard as meaning “his car.” At this point, I figured it would be trivial for Google Translate to carry my French translation back into English and get the English right on the money, but I was dead wrong. Here’s what it gave me: At home, they have everything in double. There is his own car and his own car, his own towels and his own towels, his own library and his own library. What?! Even with the input sentence screaming out the owners’ genders as loudly as possible, the translating machine ignored the screams and made everything masculine. Why did it throw the sentence’s most crucial information away? We humans know all sorts of things about couples, houses, personal possessions, pride, rivalry, jealousy, privacy, and many other intangibles that lead to such quirks as a married couple having towels embroidered “his” and “hers.” Google Translate isn’t familiar with such situations. Google Translate isn’t familiar with situations, period. It’s familiar solely with strings composed of words composed of letters. It’s all about ultrarapid processing of pieces of text, not about thinking or imagining or remembering or understanding. It doesn’t even know that words stand for things. Let me hasten to say that a computer program certainly could, in principle, know what language is for, and could have ideas and memories and experiences, and could put them to use, but that’s not what Google Translate was designed to do. Such an ambition wasn’t even on its designers’ radar screens. Well, I chuckled at these poor shows, relieved to see that we aren’t, after all, so close to replacing human translators by automata. But I still felt I should check the engine out more closely. After all, one swallow does not thirst quench. Indeed, what about this freshly coined phrase “One swallow does not thirst quench” (alluding, of course, to “One swallow does not a summer make”)? I couldn’t resist trying it out; here’s what Google Translate flipped back at me: “Une hirondelle n’aspire pas la soif.” This is a grammatical French sentence, but it’s pretty hard to fathom. First it names a certain bird (“une hirondelle”—a swallow), then it says this bird is not inhaling or not sucking (“n’aspire pas”), and finally reveals that the neither-inhaled-nor-sucked item is thirst (“la soif”). Clearly Google Translate didn’t catch my meaning; it merely came out with a heap of bull. “Il sortait simplement avectas de taureau.” “He just went out with a pile of bulls.” “Il vient de sortir avec un tas de taureaux.” Please pardon my French—or rather, Google Translate’s pseudo-French. From the frying pan of French, let’s jump into the fire of German. Of late I’ve been engrossed in the book Sie nannten sich der Wiener Kreis (They Called Themselves the Vienna Circle), by the Austrian mathematician Karl Sigmund. It describes a group of idealistic Viennese intellectuals in the 1920s and 1930s, who had a major impact on philosophy and science during the rest of the century. I chose a short passage from Sigmund’s book and gave it to Google Translate. Here it is, first in German, followed by my own translation, and then Google Translate’s version. (By the way, I checked my translation with two native speakers of German, including Karl Sigmund, so I think you can assume it is accurate.) Sigmund: Nach dem verlorenen Krieg sahen es viele deutschnationale Professoren, inzwischen die Mehrheit in der Fakultät, gewissermaßen als ihre Pflicht an, die Hochschulen vor den “Ungeraden” zu bewahren; am schutzlosesten waren junge Wissenschaftler vor ihrer Habilitation. Und Wissenschaftlerinnen kamen sowieso nicht in frage; über wenig war man sich einiger. Hofstadter: After the defeat, many professors with Pan-Germanistic leanings, who by that time constituted the majority of the faculty, considered it pretty much their duty to protect the institutions of higher learning from “undesirables.” The most likely to be dismissed were young scholars who had not yet earned the right to teach university classes. As for female scholars, well, they had no place in the system at all; nothing was clearer than that. Google Translate: After the lost war, many German-National professors, meanwhile the majority in the faculty, saw themselves as their duty to keep the universities from the “odd”; Young scientists were most vulnerable before their habilitation. And scientists did not question anyway; There were few of them. The words in Google Translate’s output are all English words (even if, for unclear reasons, a couple are inappropriately capitalized). So far, so good! But soon it grows wobbly, and the further down you go the wobblier it gets. I’ll focus first on “the ‘odd.’” This corresponds to the German “die ‘Ungeraden,’” which here means “politically undesirable people.” Google Translate, however, had a reason—a very simple statistical reason—for choosing the word “odd.” Namely, in its huge bilingual database, the word “ungerade” was almost always translated as “odd.” Although the engine didn’t realize why this was the case, I can tell you why. It’s because “ungerade”—which literally means “un-straight” or “uneven”—nearly always means “not divisible by two.” By contrast, my choice of “undesirables” to render “Ungeraden” had nothing to do with the statistics of words, but came from my understanding of the situation—from my zeroing in on a notion not explicitly mentioned in the text and certainly not listed as a translation of “ungerade” in any of my German dictionaries. Let’s move on to the German “Habilitation,” denoting a university status resembling tenure. The English cognate word “habilitation” exists but it is super-rare, and certainly doesn’t bring to mind tenure or anything like it. That’s why I briefly explained the idea rather than just quoting the obscure word, since that mechanical gesture would not get anything across to anglophonic readers. Of course Google Translate would never do anything like this, as it has no model of its readers’ knowledge. The last two sentences really bring out how crucial understanding is for translation. The 15-letter German noun “Wissenschaftler” means either “scientist” or “scholar.” (I opted for the latter, as in this context it was referring to intellectuals in general. Google Translate didn’t get that subtlety.) The related 17-letter noun “Wissenschaftlerin,” found in the closing sentence in its plural form “Wissenschaftlerinnen,” is a consequence of the gendered-ness of German nouns. Whereas the “short” noun is grammatically masculine and thus suggests a male scholar, the longer noun is feminine and applies to females only. I wrote “female scholar” to get the idea across. Google Translate, however, did not understand that the feminizing suffix “-in” was the central focus of attention in the final sentence. Since it didn’t realize that females were being singled out, the engine merely reused the word “scientist,” thus missing the sentence’s entire point. As in the earlier French case, Google Translate didn’t have the foggiest idea that the sole purpose of the German sentence was to shine a spotlight on a contrast between males and females. Aside from that blunder, the rest of the final sentence is a disaster. Take its first half. Is “scientists did not question anyway” really a translation of “Wissenschaftlerinnen kamen sowieso nicht in frage”? It doesn’t mean what the original means—it’s not even in the same ballpark. It just consists of English words haphazardly triggered by the German words. Is that all it takes for a piece of output to deserve the label “translation”? The sentence’s second half is equally erroneous. The last six German words mean, literally, “over little was one more united,” or, more flowingly, “there was little about which people were more in agreement,” yet Google Translate managed to turn that perfectly clear idea into “There were few of them.” We baffled humans might ask “Few of what?” but to the mechanical listener, such a question would be meaningless. Google Translate doesn’t have ideas behind the scenes, so it couldn’t even begin to answer the simple-seeming query. The translation engine was not imagining large or small amounts or numbers of things. It was just throwing symbols around, without any notion that they might symbolize something. It’s hard for a human, with a lifetime of experience and understanding and of using words in a meaningful way, to realize how devoid of content all the words thrown onto the screen by Google Translate are. It’s almost irresistible for people to presume that a piece of software that deals so fluently with words must surely know what they mean. This classic illusion associated with artificial-intelligence programs is called the “Eliza effect,” since one of the first programs to pull the wool over people’s eyes with its seeming understanding of English, back in the 1960s, was a vacuous phrase manipulator called Eliza, which pretended to be a psychotherapist, and as such, it gave many people who interacted with it the eerie sensation that it deeply understood their innermost feelings. For decades, sophisticated people—even some artificial-intelligence researchers—have fallen for the Eliza effect. In order to make sure that my readers steer clear of this trap, let me quote some phrases from a few paragraphs up—namely, “Google Translate did not understand,” “it did not realize,” and “Google Translate didn’t have the foggiest idea.” Paradoxically, these phrases, despite harping on the lack of understanding, almost suggest that Google Translate might at least sometimes be capable of understanding what a word or a phrase or a sentence means, or is about. But that isn’t the case. Google Translate is all about bypassing or circumventing the act of understanding language. To me, the word “translation” exudes a mysterious and evocative aura. It denotes a profoundly human art form that graciously carries clear ideas in Language A into clear ideas in Language B, and the bridging act not only should maintain clarity, but also should give a sense for the flavor, quirks, and idiosyncrasies of the writing style of the original author. Whenever I translate, I first read the original text carefully and internalize the ideas as clearly as I can, letting them slosh back and forth in my mind. It’s not that the words of the original are sloshing back and forth; it’s the ideas that are triggering all sorts of related ideas, creating a rich halo of related scenarios in my mind. Needless to say, most of this halo is unconscious. Only when the halo has been evoked sufficiently in my mind do I start to try to express it—to “press it out”—in the second language. I try to say in Language B what strikes me as a natural B-ish way to talk about the kinds of situations that constitute the halo of meaning in question. I am not, in short, moving straight from words and phrases in Language A to words and phrases in Language B. Instead, I am unconsciously conjuring up images, scenes, and ideas, dredging up experiences I myself have had (or have read about, or seen in movies, or heard from friends), and only when this nonverbal, imagistic, experiential, mental “halo” has been realized—only when the elusive bubble of meaning is floating in my brain—do I start the process of formulating words and phrases in the target language, and then revising, revising, and revising. This process, mediated via meaning, may sound sluggish, and indeed, in comparison with Google Translate’s two or three seconds per page, it certainly is—but it is what any serious human translator does. This is the kind of thing I imagine when I hear an evocative phrase like “deep mind.” That said, I turn now to Chinese, a language that gave the deep-learning software a far rougher ride than the two European languages did. For my test material, I drew from the touching memoir Women Sa (We Three), written by the Chinese playwright and translator Yang Jiang, who recently died at 104. Her book recounts the intertwined lives of herself, her husband Qian Zhongshu (also a novelist and translator), and their daughter. It is not written in an especially arcane manner, but it uses an educated, lively Chinese. I chose a short passage and let Google Translate loose on it. Here are the results, along with my own translation (again vetted by native speakers of Chinese): Yang: 锺书到清华工作一年后,调任毛选翻译委员会的工作,住在城里,周末回校。 他仍兼管研究生。 毛选翻译委员会的领导是徐永煐同志。介绍锺书做这份工作的是清华同学乔冠华同志。 事定之日,晚饭后,有一位旧友特雇黄包车从城里赶来祝贺。客去后,锺书惶恐地对我说: 他以为我要做“南书房行走”了。这件事不是好做的,不求有功,但求无过。 Hofstadter: After Zhongshu had worked at Tsinghua University for a year, he was transferred to the committee that was translating selected works of Chairman Mao. He lived in the city, but each weekend he would return to school. He also was still supervising his graduate students. The leader of the translation committee of Mao’s works was Comrade Xu Yongying, and the person who had arranged for Zhongshu to do this work was his old Tsinghua schoolmate, Comrade Qiao Guanhua. On the day this appointment was decided, after dinner, an old friend specially hired a rickshaw and came all the way from the city just to congratulate Zhongshu. After our guest had left, Zhongshu turned to me uneasily and said: “He thought I was going to become a ‘South Study special aide.’ This kind of work is not easy. You can’t hope for glory; all you can hope for is to do it without errors.” Google Translate: After a year of work at Tsinghua, he was transferred to the Mao Translating Committee to live in the city and back to school on weekends. He is still a graduate student. The leadership of the Mao Tse Translation Committee is Comrade Xu Yongjian. Introduction to the book to do this work is Tsinghua students Qiao Guanhua comrades. On the day of the event, after dinner, an old friend hired a rickshaw from the city to congratulate. Guest to go, the book of fear in the book said to me: He thought I had to do “South study walking.” This is not a good thing to do, not for meritorious service, but for nothing. I’ll briefly point out a few oddities. First of all, Google Translate never refers to Zhongshu by name, although his name (“锺书”) occurs three times in the original. The first time, the engine uses the pronoun “he”; the second time around it says “the book”; the third time it says “the book of fear in the book.” Go figure! A second oddity is that the first paragraph clearly says that Zhongshu is supervising graduate students, whereas Google Translate turns him into a graduate student. A third oddity is that in the phrase “Mao Tse Translation Committee,” one third of Chairman Mao Tse Tung’s name fell off the train. A fourth oddity is that the name “Yongying” was replaced by “Yongjian.” A fifth oddity is that “after our guest had left” was reduced to “guest to go.” A sixth oddity is that the last sentence makes no sense at all. Well, these six oddities are already quite a bit of humble pie for Google Translate to swallow, but let’s forgive and forget. Instead, I’ll focus in on just one confusing phrase I ran into—a five-character phrase in quotation marks in the last paragraph (“南书房行走”). Character for character, it might be rendered as “south book room go walk,” but that jumble is clearly unacceptable, especially as the context requires it to be a noun. Google Translate invented “South study walking,” which is not ful. Now I admit that the Chinese phrase was utterly opaque to me. Although literally it looked like it meant something about moving about on foot in a study on the south side of some building, I knew that couldn’t be right; it made no sense in the context. To translate it, I had to find out about something in Chinese culture that I was ignorant of. So where did I turn for ? To Google! (But not to Google Translate.) I typed in the Chinese characters, surrounded them by quote marks, then did a Google search for that exact literal string. Lickety-split, up came a bunch of web pages in Chinese, and then I painfully slogged my way through the opening paragraphs of the first couple of websites, trying to figure out what the phrase was all about. I discovered the term dates back to the Qing Dynasty (1644–1911), and refers to an intellectual assistant to the emperor, whose duty was to the emperor (in the imperial palace’s south study) stylishly craft official statements. The two characters that seem to mean “go walk” actually form a chunk denoting an aide. And so, given that information supplied by Google Search, I came up with my phrase “South Study special aide.” It’s too bad Google Translate couldn’t avail itself of the services of Google Search as I did, isn’t it? But then again, Google Translate can’t understand web pages, although it can translate them in the twinkling of an eye. Or can it? Below I exhibit the astounding piece of output text that Google Translate super-swiftly spattered across my screen after being fed the opening of the website that I got my info from: “South study walking” is not an official position, before the Qing era this is just a “messenger,” generally by the then imperial intellectuals Hanlin to serve as. South study in the Hanlin officials in the “select chencai only goods and excellent” into the value, called “South study walking.” Because of the close to the emperor, the emperor’s decision to have a certain influence. Yongzheng later set up “military aircraft,” the Minister of the military machine, full-time, although the study is still Hanlin into the value, but has no participation in government affairs. Scholars in the Qing Dynasty into the value of the South study proud. Many scholars and scholars in the early Qing Dynasty into the south through the study. Is this actually in English? Of course we all agree that it’s made of English words (for the most part, anyway), but does that imply that it’s a passage in English? To my mind, since the above paragraph contains no meaning, it’s not in English; it’s just a jumble made of English ingredients—a random word salad, an incoherent hodgepodge. In case you’re curious, here’s my version of the same passage (it took me hours): The nan-shufang-xingzou (“South Study special aide”) was not an official position, but in the early Qing Dynasty it was a special role generally filled by whoever was the emperor’s current intellectual academician. The group of academicians who worked in the imperial palace’s south study would choose, among themselves, someone of great talent and good character to serve as ghostwriter for the emperor, and always to be at the emperor’s beck and call; that is why this role was called “South Study special aide.” The South Study aide, being so close to the emperor, was clearly in a position to influence the latter’s policy decisions. However, after Emperor Yongzheng established an official military ministry with a minister and various lower positions, the South Study aide, despite still being in the service of the emperor, no longer played a major role in governmental decision-making. Nonetheless, Qing Dynasty scholars were eager for the glory of working in the emperor’s south study, and during the early part of that dynasty, quite a few famous scholars served the emperor as South Study special aides. Some readers may suspect that I, in order to bash Google Translate, cherry-picked passages on which it stumbled terribly, and that it actually does far better on the vast majority of passages. Though that sounds plausible, it’s not the case. Nearly every paragraph I selected from books I’m currently reading gave rise to translation blunders of all shapes and sizes, including senseless and incomprehensible phrases, as above. Of course I grant that Google Translate sometimes comes up with a series of output sentences that sound fine (although they may be misleading or utterly wrong). A whole paragraph or two may come out superbly, giving the illusion that Google Translate knows what it is doing, understands what it is “reading.” In such cases, Google Translate seems truly impressive—almost human! Praise is certainly due to its creators and their collective hard work. But at the same time, don’t forget what Google Translate did with these two Chinese passages, and with the earlier French and German passages. To understand such failures, one has to keep the ELIZA effect in mind. The bailingual engine isn’t reading anything—not in the normal human sense of the verb “to read.” It’s processing text. The symbols it’s processing are disconnected from experiences in the world. It has no memories on which to draw, no imagery, no understanding, no meaning residing behind the words it so rapidly flings around. A friend asked me whether Google Translate’s level of skill isn’t merely a function of the program’s database. He figured that if you multiplied the database by a factor of, say, a million or a billion, eventually it would be able to translate anything thrown at it, and essentially perfectly. I don’t think so. Having ever more “big data” won’t bring you any closer to understanding, since understanding involves having ideas, and lack of ideas is the root of all the problems for machine translation today. So I would venture that bigger databases—even vastly bigger ones—won’t turn the trick. Another natural question is whether Google Translate’s use of neural networks—a gesture toward imitating brains—is bringing us closer to genuine understanding of language by machines. This sounds plausible at first, but there’s still no attempt being made to go beyond the surface level of words and phrases. All sorts of statistical facts about the huge databases are embodied in the neural nets, but these statistics merely relate words to other words, not to ideas. There’s no attempt to create internal structures that could be thought of as ideas, images, memories, or experiences. Such mental etherea are still far too elusive to deal with computationally, and so, as a substitute, fast and sophisticated statistical word-clustering algorithms are used. But the results of such techniques are no match for actually having ideas involved as one reads, understands, creates, modifies, and judges a piece of writing. Despite my negativism, Google Translate offers a service many people value highly: It effects quick-and-dirty conversions of meaningful passages written in language A into not necessarily meaningful strings of words in language B. As long as the text in language B is somewhat comprehensible, many people feel perfectly satisfied with the end product. If they can “get the basic idea” of a passage in a language they don’t know, they’re happy. This isn’t what I personally think the word “translation” means, but to some people it’s a great service, and to them it qualifies as translation. Well, I can see what they want, and I understand that they’re happy. Lucky them! I’ve recently seen bar graphs made by technophiles that claim to represent the “quality” of translations done by humans and by computers, and these graphs depict the latest translation engines as being within striking distance of human-level translation. To me, however, such quantification of the unquantifiable reeks of pseudoscience, or, if you prefer, of nerds trying to mathematize things whose intangible, subtle, artistic nature eludes them. To my mind, Google Translate’s output today ranges all the way from excellent to grotesque, but I can’t quantify my feelings about it. Think of my first example involving “his” and “her” items. The idealess program got nearly all the words right, but despite that slight success, it totally missed the point. How, in such a case, should one “quantify” the quality of the job? The use of scientific-looking bar graphs to represent translation quality is simply an abuse of the external trappings of science. Let me return to that sad image of human translators, soon outdone and outmoded, gradually turning into nothing but quality controllers and text tweakers. That’s a recipe for mediocrity at best. A serious artist doesn’t start with a kitschy piece of error-ridden bilgewater and then patch it up here and there to produce a work of high art. That’s not the nature of art. And translation is an art. In my writings over the years, I’ve always maintained that the human brain is a machine—a very complicated kind of machine—and I’ve vigorously opposed those who say that machines are intrinsically incapable of dealing with meaning. There is even a school of philosophers who claim computers could never “have semantics” because they’re made of “the wrong stuff” (silicon). To me, that’s facile nonsense. I won’t touch that debate here, but I wouldn’t want to leave readers with the impression that I believe intelligence and understanding to be forever inaccessible to computers. If in this essay I seem to come across sounding that way, it’s because the technology I’ve been discussing makes no attempt to reproduce human intelligence. Quite the contrary: It attempts to make an end run around human intelligence, and the output passages exhibited above clearly reveal its giant lacunas. From my point of view, there is no fundamental reason that machines could not, in principle, someday think, be creative, funny, nostalgic, excited, frightened, ecstatic, resigned, hopeful, and, as a corollary, able to translate admirably between languages. There’s no fundamental reason that machines might not someday succeed smashingly in translating jokes, puns, screenplays, novels, poems, and, of course, essays like this one. But all that will come about only when machines are as filled with ideas, emotions, and experiences as human beings are. And that’s not around the corner. Indeed, I believe it is still extremely far away. At least that is what this lifelong admirer of the human mind’s profundity fervently hopes. When, one day, a translation engine crafts an artistic novel in verse in English, using precise rhyming iambic tetrameter rich in wit, pathos, and sonic verve, then I’ll know it’s time for me to tip my hat and bow out. __________________________________________________________________ This article originally misstated the number of languages for which the deep-learning version of Google Translate is available. We regret the error. We want to hear what you think about this article. a letter to the editor or write to letters@theatlantic.com. Douglas Hofstadter is a professor of cognitive science and comparative literature at Indiana University at Bloomington. He is the author of Gödel, Escher, Bach. IFRAME: https://www.googletagmanager.com/ns.html?idGTM-56LJR35 Machine Translation: Mining Text for Social Theory Annual Review of Sociology Vol. 42:21-50 (Volume publication date July 2016) First published online as a Review in Advance on June 1, 2016 https://doi.org/10.1146/annurev-soc-081715-074206 James A. Evans and Pedro Aceves Abstract More of the social world lives within electronic text than ever before, from collective activity on the web, social media, and instant messaging to online transactions, government intelligence, and digitized libraries. This supply of text has elicited demand for natural language processing and machine learning tools to filter, search, and translate text into valuable data. We survey some of the most exciting computational approaches to text analysis, highlighting both supervised methods that extend old theories to new data and unsupervised techniques that discover hidden regularities worth theorizing. We then review recent research that uses these tools to develop social insight by exploring (a) collective attention and reasoning through the content of communication; (b) social relationships through the process of communication; and (c) social states, roles, and moves identified through heterogeneous signals within communication. We highlight social questions for which these advances could offer powerful new insight. Keywords content analysis, big data, natural language processing, machine learning, text analysis, computational methods, grounded theory Figures alternate Edit this page Wikipedia (en) Machine translation From Wikipedia, the free encyclopedia Jump to navigation Jump to search Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation (MAHT) or interactive translation) is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another. On a basic level, MT performs simple substitution of words in one language for words in another, but that alone usually cannot produce a good translation of a text because recognition of whole phrases and their closest counterparts in the target language is needed. Solving this problem with corpus statistical, and neural techniques is a rapidly growing field that is leading to better translations, handling differences in linguistic typology, translation of idioms, and the isolation of anomalies.^[1]^[not in citation given] Current machine translation software often allows for customization by domain or profession (such as weather reports), improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text. Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are proper names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators and, in a very limited number of cases, can even produce output that can be used as is (e.g., weather reports). The progress and potential of machine translation have been debated much through its history. Since the 1950s, a number of scholars have questioned the possibility of achieving fully automatic machine translation of high quality, first and most notably by Yehoshua Bar-Hillel.^[2] Some critics claim that there are in-principle obstacles to automating the translation process.^[3] [_] History[edit] Main article: History of machine translation The idea of machine translation may be traced back to the 17th century. In 1629, René Descartes proposed a universal language, with equivalent ideas in different tongues sharing one symbol.^[4] The field of "machine translation" appeared in Warren Weaver^[5]'s Memorandum on Translation (1949). The first researcher in the field, Yehosha Bar-Hillel, began his research at MIT (1951). A Georgetown University MT research team followed (1951) with a public demonstration of its Georgetown-IBM experiment system in 1954. MT research programs popped up in Japan^[6]^[7] and Russia (1955), and the first MT conference was held in London (1956).^[8]^[9] Researchers continued to join the field as the Association for Machine Translation and Computational Linguistics was formed in the U.S. (1962) and the National Academy of Sciences formed the Automatic Language Processing Advisory Committee (ALPAC) to study MT (1964). Real progress was much slower, however, and after the ALPAC report (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced.^[10] According to a 1972 report by the Director of Defense Research and Engineering (DDR&E), the feasibility of large-scale MT was reestablished by the success of the Logos MT system in translating military manuals into Vietnamese during that conflict. The French Textile Institute also used MT to translate abstracts from and into French, English, German and Spanish (1970); Brigham Young University started a project to translate Mormon texts by automated translation (1971); and Xerox used SYSTRAN to translate technical manuals (1978). Beginning in the late 1980s, as computational power increased and became less expensive, more interest was shown in statistical models for machine translation. Various MT companies were launched, including Trados (1984), which was the first to develop and market translation memory technology (1989). The first commercial MT system for Russian / English / German-Ukrainian was developed at Kharkov State University (1991). MT on the web started with SYSTRAN Offering free translation of small texts (1996), followed by AltaVista Babelfish, which racked up 500,000 requests a day (1997). Franz-Josef Och (the future head of Translation Development AT Google) won DARPA's speed MT competition (2003). More innovations during this time included MOSES, the open-source statistical MT engine (2007), a text/SMS translation service for mobiles in Japan (2008), and a mobile phone with built-in speech-to-speech translation functionality for English, Japanese and Chinese (2009). Recently, Google announced that Google Translate translates roughly enough text to fill 1 million books in one day (2012). The idea of using digital computers for translation of natural languages was proposed as early as 1946 by A. D. Booth and possibly others. Warren Weaver wrote an important memorandum "Translation" in 1949. The Georgetown experiment was by no means the first such application, and a demonstration was made in 1954 on the APEXC machine at Birkbeck College (University of London) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (see for example Wireless World, Sept. 1955, Cleave and Zacharov). A similar application, also pioneered at Birkbeck College at the time, was reading and composing Braille texts by computer. Translation process[edit] Main article: Translation process The human translation process may be described as: 1. Decoding the meaning of the source text; and 2. Re-encoding this meaning in the target language. Behind this ostensibly simple procedure lies a complex cognitive operation. To decode the meaning of the source text in its entirety, the translator must interpret and analyse all the features of the text, a process that requires in-depth knowledge of the grammar, semantics, syntax, idioms, etc., of the source language, as well as the culture of its speakers. The translator needs the same in-depth knowledge to re-encode the meaning in the target language. Therein lies the challenge in machine translation: how to program a computer that will "understand" a text as a person does, and that will "create" a new text in the target language that sounds as if it has been written by a person. In its most general application, this is beyond current technology. Though it works much faster, no automated translation program or procedure, with no human participation, can produce output even close to the quality a human translator can produce. What it can do, however, is provide a general, though imperfect, approximation of the original text, getting the "gist" of it (a process called "gisting"). This is sufficient for many purposes, including making best use of the finite and expensive time of a human translator, reserved for those cases in which total accuracy is indispensable. This problem may be approached in a number of ways, through the evolution of which accuracy has improved. Approaches[edit] Bernard Vauquois' pyramid showing comparative depths of intermediary representation, interlingual machine translation at the peak, followed by transfer-based, then direct translation. Machine translation can use a method based on linguistic rules, which means that words will be translated in a linguistic way – the most suitable (orally speaking) words of the target language will replace the ones in the source language.^[citation needed] It is often argued that the success of machine translation requires the problem of natural language understanding to be solved first.^[11] Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach is described as interlingual machine translation or transfer-based machine translation. These methods require extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules. Given enough data, machine translation programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual corpus of data needed for statistical methods to work is not necessary for the grammar-based methods. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use. To translate between closely related languages, the technique referred to as rule-based machine translation may be used. Rule-based[edit] Main article: Rule-based machine translation The rule-based machine translation paradigm includes transfer-based machine translation, interlingual machine translation and dictionary-based machine translation paradigms. This type of translation is used mostly in the creation of dictionaries and grammar programs. Unlike other methods, RBMT involves more information about the linguistics of the source and target languages, using the morphological and syntactic rules and semantic analysis of both languages. The basic approach involves linking the structure of the input sentence with the structure of the output sentence using a parser and an analyzer for the source language, a generator for the target language, and a transfer lexicon for the actual translation. RBMT's biggest downfall is that everything must be made explicit: orthographical variation and erroneous input must be made part of the source language analyser in order to cope with it, and lexical selection rules must be written for all instances of ambiguity. Adapting to new domains in itself is not that hard, as the core grammar is the same across domains, and the domain-specific adjustment is limited to lexical selection adjustment. Transfer-based machine translation[edit] Main article: Transfer-based machine translation Transfer-based machine translation is similar to interlingual machine translation in that it creates a translation from an intermediate representation that simulates the meaning of the original sentence. Unlike interlingual MT, it depends partially on the language pair involved in the translation. Interlingual[edit] Main article: Interlingual machine translation Interlingual machine translation is one instance of rule-based machine-translation approaches. In this approach, the source language, i.e. the text to be translated, is transformed into an interlingual language, i.e. a "language neutral" representation that is independent of any language. The target language is then generated out of the interlingua. One of the major advantages of this system is that the interlingua becomes more valuable as the number of target languages it can be turned into increases. However, the only interlingual machine translation system that has been made operational at the commercial level is the KANT system (Nyberg and Mitamura, 1992), which is designed to translate Caterpillar Technical English (CTE) into other languages. Dictionary-based[edit] Main article: Dictionary-based machine translation Machine translation can use a method based on dictionary entries, which means that the words will be translated as they are by a dictionary. Statistical[edit] Main article: Statistical machine translation Statistical machine translation tries to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the European Parliament. Where such corpora are available, good results can be achieved translating similar texts, but such corpora are still rare for many language pairs. The first statistical machine translation software was CANDIDE from IBM. Google used SYSTRAN for several years, but switched to a statistical translation method in October 2007.^[12] In 2005, Google improved its internal translation capabilities by using approximately 200 billion words from United Nations materials to train their system; translation accuracy improved.^[13] Google Translate and similar statistical translation programs work by detecting patterns in hundreds of millions of documents that have previously been translated by humans and making intelligent guesses based on the findings. Generally, the more human-translated documents available in a given language, the more likely it is that the translation will be of good quality.^[14] Newer approaches into Statistical Machine translation such as METIS II and PRESEMT use minimal corpus size and instead focus on derivation of syntactic structure through pattern recognition. With further development, this may allow statistical machine translation to operate off of a monolingual text corpus.^[15] SMT's biggest downfall includes it being dependent upon huge amounts of parallel texts, its problems with morphology-rich languages (especially with translating into such languages), and its inability to correct singleton errors. Example-based[edit] Main article: Example-based machine translation Example-based machine translation (EBMT) approach was proposed by Makoto Nagao in 1984.^[16]^[17] Example-based machine translation is based on the idea of analogy. In this approach, the corpus that is used is one that contains texts that have already been translated. Given a sentence that is to be translated, sentences from this corpus are selected that contain similar sub-sentential components.^[18] The similar sentences are then used to translate the sub-sentential components of the original sentence into the target language, and these phrases are put together to form a complete translation. Hybrid MT[edit] Main article: Hybrid machine translation Hybrid machine translation (HMT) leverages the strengths of statistical and rule-based translation methodologies.^[19] Several MT organizations (such as Omniscien Technologies (formerly Asia Online), LinguaSys, Systran, and Polytechnic University of Valencia) claim a hybrid approach that uses both rules and statistics. The approaches differ in a number of ways: Rules post-processed by statistics: Translations are performed using a rules based engine. Statistics are then used in an attempt to adjust/correct the output from the rules engine. Statistics guided by rules: Rules are used to pre-process data in an attempt to better guide the statistical engine. Rules are also used to post-process the statistical output to perform functions such as normalization. This approach has a lot more power, flexibility and control when translating. It also provides extensive control over the way in which the content is processed during both pre-translation (e.g. markup of content and non-translatable terms) and post-translation (e.g. post translation corrections and adjustments). More recently, with the advent of Neural MT, a new version of hybrid machine translation is emerging that combines the benefits of rules, statistical and neural machine translation. The approach allows benefitting from pre- and post-processing in a rule guided workflow as well as benefitting from NMT and SMT. The downside is the inherent complexity which makes the approach suitable only for specific use cases. One of the proponents of this approach for complex use cases is Omniscien Technologies. Neural MT[edit] Main article: Neural machine translation A deep learning based approach to MT, neural machine translation has made rapid progress in recent years, and Google has announced its translation services are now using this technology in preference to its previous statistical methods^[20]. Other providers including Pangeanic^[21], KantanMT^[22], Omniscien Technologies^[23] and SDL^[24] have announced the deployment of neural machine translation technology in 2017 as well. Major issues[edit] Machine translation could produce some non-understandable phrases. Broken Chinese "沒有進入" from machine translation in Bali, Indonesia. The broken Chinese sentence sounds like "there does not exist an entry" or "have not entered yet" Disambiguation[edit] Main articles: Word sense disambiguation and Syntactic disambiguation Word-sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by Yehoshua Bar-Hillel.^[25] He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word.^[26] Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches. Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.^[27] Claude Piron, a long-time translator for the United Nations and the World Health Organization, wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities in the source text, which the grammatical and lexical exigencies of the target language require to be resolved: Why does a translator need a whole workday to translate five pages, and not an hour or two? ..... About 90% of an average text corresponds to these simple conditions. But unfortunately, there's the other 10%. It's that part that requires six [more] hours of work. There are ambiguities one has to resolve. For instance, the author of the source text, an Australian physician, cited the example of an epidemic which was declared during World War II in a "Japanese prisoner of war camp". Was he talking about an American camp with Japanese prisoners or a Japanese camp with American prisoners? The English has two senses. It's necessary therefore to do research, maybe to the extent of a phone call to Australia.^[28] The ideal deep approach would require the translation software to do all the research necessary for this kind of disambiguation on its own; but this would require a higher degree of AI than has yet been attained. A shallow approach which simply guessed at the sense of the ambiguous English phrase that Piron mentions (based, perhaps, on which kind of prisoner-of-war camp is more often mentioned in a given corpus) would have a reasonable chance of guessing wrong fairly often. A shallow approach that involves "ask the user about each ambiguity" would, by Piron's estimate, only automate about 25% of a professional translator's job, leaving the harder 75% still to be done by a human. Non-standard speech[edit] One of the major pitfalls of MT is its inability to translate non-standard language with the same accuracy as standard language. Heuristic or statistical based MT takes input from various sources in standard form of a language. Rule-based translation, by nature, does not include common non-standard usages. This causes errors in translation from a vernacular source or into colloquial language. Limitations on translation from casual speech present issues in the use of machine translation in mobile devices. Named entities[edit] Related to named entity recognition in information extraction. Name entities, in narrow sense, refer to concrete or abstract entities in the real world including people, organizations, companies, places etc. It also refers to expressing of time, space, quantity such as 1 July 2011, $79.99 and so on.^[29] Named entities occur in the text being analyzed in statistical machine translation. The initial difficulty that arises in dealing with named entities is simply identifying them in the text. Consider the list of names common in a particular language to illustrate this – the most common names are different for each language and also are constantly changing. If named entities cannot be recognized by the machine translator, they may be erroneously translated as common nouns, which would most likely not affect the BLEU rating of the translation but would change the text's human readability.^[30] It is also possible that, when not identified, named entities will be omitted from the output translation, which would also have implications for the text's readability and message. Another way to deal with named entities is to use transliteration instead of translation, meaning that you find the letters in the target language that most closely correspond to the name in the source language. There have been attempts to incorporate this into machine translation by adding a transliteration step into the translation procedure. However, these attempts still have their problems and have even been cited as worsening the quality of translation.^[31] Named entities were still identified incorrectly, with words not being transliterated when they should or being transliterated when they shouldn't. For example, for "Southern California" the first word should be translated directly, while the second word should be transliterated. However, machines would often transliterate both because they treated them as one entity. Words like these are hard for machine translators, even those with a transliteration component, to process. The lack of attention to the issue of named entity translation has been recognized as potentially stemming from a lack of resources to devote to the task in addition to the complexity of creating a good system for named entity translation. One approach to named entity translation has been to transliterate, and not translate, those words. A second is to create a "do-not-translate" list, which has the same end goal – transliteration as opposed to translation.^[32] Both of these approaches still rely on the correct identification of named entities, however. A third approach to successful named entity translation is a class-based model. In this method, named entities are replaced with a token to represent the class they belong to. For example, "Ted" and "Erica" would both be replaced with "person" class token. In this way the statistical distribution and use of person names in general can be analyzed instead of looking at the distributions of "Ted" and "Erica" individually. A problem that the class based model solves is that the probability of a given name in a specific language will not affect the assigned probability of a translation. A study by Stanford on improving this area of translation gives the examples that different probabilities will be assigned to "David is going for a walk" and "Ankit is going for a walk" for English as a target language due to the different number of occurrences for each name in the training data. A frustrating outcome of the same study by Stanford (and other attempts to improve named recognition translation) is that many times, a decrease in the BLEU scores for translation will result from the inclusion of methods for named entity translation.^[32] Translation from multiparallel sources[edit] Some work has been done in the utilization of multiparallel corpora, that is a body of text that has been translated into 3 or more languages. Using these methods, a text that has been translated into 2 or more languages may be utilized in combination to provide a more accurate translation into a third language compared with if just one of those source languages were used alone.^[33]^[34]^[35] Ontologies in MT[edit] An ontology is a formal representation of knowledge which includes the concepts (such as objects, processes etc.) in a domain and some relations between them. If the stored information is of linguistic nature, one can speak of a lexicon.^[36] In NLP, ontologies can be used as a source of knowledge for machine translation systems. With access to a large knowledge base, systems can be enabled to resolve many (especially lexical) ambiguities on their own. In the following classic examples, as humans, we are able to interpret the prepositional phrase according to the context because we use our world knowledge, stored in our lexicons: "I saw a man/star/molecule with a microscope/telescope/binoculars."^[36] A machine translation system initially would not be able to differentiate between the meanings because syntax does not change. With a large enough ontology as a source of knowledge however, the possible interpretations of ambiguous words in a specific context can be reduced. Other areas of usage for ontologies within NLP include information retrieval, information extraction and text summarization.^[36] Building ontologies[edit] The ontology generated for the PANGLOSS knowledge-based machine translation system in 1993 may serve as an example of how an ontology for NLP purposes can be compiled:^[37] A large-scale ontology is necessary to parsing in the active modules of the machine translation system. In the PANGLOSS example, about 50.000 nodes were intended to be subsumed under the smaller, manually-built upper (abstract) region of the ontology. Because of its size, it had to be created automatically. The goal was to merge the two resources LDOCE online and WordNet to combine the benefits of both: concise definitions from Longman, and semantic relations allowing for semi-automatic taxonomization to the ontology from WordNet. + A definition match algorithm was created to automatically merge the correct meanings of ambiguous words between the two online resources, based on the words that the definitions of those meanings have in common in LDOCE and WordNet. Using a similarity matrix, the algorithm delivered matches between meanings including a confidence factor. This algorithm alone, however, did not match all meanings correctly on its own. + A second hierarchy match algorithm was therefore created which uses the taxonomic hierarchies found in WordNet (deep hierarchies) and partially in LDOCE (flat hierarchies). This works by first matching unambiguous meanings, then limiting the search space to only the respective ancestors and descendants of those matched meanings. Thus, the algorithm matched locally unambiguous meanings (for instance, while the word seal as such is ambiguous, there is only one meaning of "seal" in the animal subhierarchy). Both algorithms complemented each other and ed constructing a large-scale ontology for the machine translation system. The WordNet hierarchies, coupled with the matching definitions of LDOCE, were subordinated to the ontology's upper region. As a result, the PANGLOSS MT system was able to make use of this knowledge base, mainly in its generation element. Applications[edit] While no system provides the holy grail of fully automatic high-quality machine translation of unrestricted text, many fully automated systems produce reasonable output.^[38]^[39]^[40] The quality of machine translation is substantially improved if the domain is restricted and controlled.^[41] Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the European Commission. The MOLTO project, for example, coordinated by the University of Gothenburg, received more than 2.375 million euros project support from the EU to create a reliable translation tool that covers a majority of the EU languages.^[42] The further development of MT systems comes at a time when budget cuts in human translation may increase the EU's dependency on reliable MT programs.^[43] The European Commission contributed 3.072 million euros (via its ISA programme) for the creation of MT@EC, a statistical machine translation program tailored to the administrative needs of the EU, to replace a previous rule-based machine translation system.^[44] In 2005, Google claimed that promising results were obtained using a proprietary statistical machine translation engine.^[45] The statistical translation engine used in the Google language tools for Arabic English and Chinese English had an overall score of 0.4281 over the runner-up IBM's BLEU-4 score of 0.3954 (Summer 2006) in tests conducted by the National Institute for Standards and Technology.^[46]^[47]^[48] With the recent focus on terrorism, the military sources in the United States have been investing significant amounts of money in natural language engineering. In-Q-Tel^[49] (a venture capital fund, largely funded by the US Intelligence Community, to stimulate new technologies through private sector entrepreneurs) brought up companies like Language Weaver. Currently the military community is interested in translation and processing of languages like Arabic, Pashto, and Dari.^[citation needed] Within these languages, the focus is on key phrases and quick communication between military members and civilians through the use of mobile phone apps.^[50] The Information Processing Technology Office in DARPA hosts programs like TIDES and Babylon translator. US Air Force has awarded a $1 million contract to develop a language translation technology.^[51] The notable rise of social networking on the web in recent years has created yet another niche for the application of machine translation software – in utilities such as , or instant messaging clients such as Skype, GoogleTalk, MSN Messenger, etc. – allowing users speaking different languages to communicate with each other. Machine translation applications have also been released for most mobile devices, including mobile telephones, pocket PCs, PDAs, etc. Due to their portability, such instruments have come to be designated as mobile translation tools enabling mobile business networking between partners speaking different languages, or facilitating both foreign language learning and unaccompanied traveling to foreign countries without the need of the intermediation of a human translator. Despite being labelled as an unworthy competitor to human translation in 1966 by the Automated Language Processing Advisory Committee put together by the United States government,^[52] the quality of machine translation has now been improved to such levels that its application in online collaboration and in the medical field are being investigated. The application of this technology in medical settings where human translators are absent is another topic of research, but difficulties arise due to the importance of accurate translations in medical diagnoses.^[53] Evaluation[edit] Main article: Evaluation of machine translation There are many factors that affect how machine translation systems are evaluated. These factors include the intended use of the translation, the nature of the machine translation software, and the nature of the translation process. Different programs may work well for different purposes. For example, statistical machine translation (SMT) typically outperforms example-based machine translation (EBMT), but researchers found that when evaluating English to French translation, EBMT performs better.^[54] The same concept applies for technical documents, which can be more easily translated by SMT because of their formal language. In certain applications, however, e.g., product descriptions written in a controlled language, a dictionary-based machine-translation system has produced satisfactory translations that require no human intervention save for quality inspection.^[55] There are various means for evaluating the output quality of machine translation systems. The oldest is the use of human judges^[56] to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable method to compare different systems such as rule-based and statistical systems.^[57] Automated means of evaluation include BLEU, NIST, METEOR, and LEPOR.^[58] Relying exclusively on unedited machine translation ignores the fact that communication in human language is context-embedded and that it takes a person to comprehend the context of the original text with a reasonable degree of probability. It is certainly true that even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be useful to a human being and that publishable-quality translation is achieved, such translations must be reviewed and edited by a human.^[59] The late Claude Piron wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities in the source text, which the grammatical and lexical exigencies of the target language require to be resolved. Such research is a necessary prelude to the pre-editing necessary in order to provide input for machine-translation software such that the output will not be meaningless.^[60] In addition to disambiguation problems, decreased accuracy can occur due to varying levels of training data for machine translating programs. Both example-based and statistical machine translation rely on a vast array of real example sentences as a base for translation, and when too many or too few sentences are analyzed accuracy is jeopardized. Researchers found that when a program is trained on 203,529 sentence pairings, accuracy actually decreases.^[54] The optimal level of training data seems to be just over 100,000 sentences, possibly because as training data increases, the number of possible sentences increases, making it harder to find an exact translation match. Using machine translation as a teaching tool[edit] Although there have been concerns about machine translation's accuracy, Dr. Ana Nino of the University of Manchester has researched some of the advantages in utilizing machine translation in the classroom. One such pedagogical method is called using "MT as a Bad Model."^[61] MT as a Bad Model forces the language learner to identify inconsistencies or incorrect aspects of a translation; in turn, the individual will (hopefully) possess a better grasp of the language. Dr. Nino cites that this teaching tool was implemented in the late 1980s. At the end of various semesters, Dr. Nino was able to obtain survey results from students who had used MT as a Bad Model (as well as other models.) Overwhelmingly, students felt that they had observed improved comprehension, lexical retrieval, and increased confidence in their target language.^[61] Machine translation and signed languages[edit] Main article: Machine translation of sign languages In the early 2000s, options for machine translation between spoken and signed languages were severely limited. It was a common belief that deaf individuals could use traditional translators. However, stress, intonation, pitch, and timing are conveyed much differently in spoken languages compared to signed languages. Therefore, a deaf individual may misinterpret or become confused about the meaning of written text that is based on a spoken language.^[62] Researchers Zhao,al. (2000), developed a prototype called TEAM (translation from English to ASL by machine) that completed English to American Sign Language (ASL) translations. The program would first analyze the syntactic, grammatical, and morphological aspects of the English text. Following this step, the program accessed a sign synthesizer, which acted as a dictionary for ASL. This synthesizer housed the process one must follow to complete ASL signs, as well as the meanings of these signs. Once the entire text is analyzed and the signs necessary to complete the translation are located in the synthesizer, a computer generated human appeared and would use ASL to sign the English text to the user.^[62] Copyright[edit] Only works that are original are subject to copyright protection, so some scholars claim that machine translation results are not entitled to copyright protection because MT does not involve creativity.^[63] The copyright at issue is for a derivative work; the author of the original work in the original language does not lose his rights when a work is translated: a translator must have permission to publish a translation. See also[edit] Comparison of machine translation applications Statistical machine translation Controlled language in machine translation Cache language model Computational linguistics Universal Networking Language Computer-assisted translation and Translation memory Foreign language writing aid Controlled natural language Fuzzy matching Postediting History of machine translation Human language technology Humour in translation ("howlers") Language and Communication Technologies Language barrier List of emerging technologies List of research laboratories for machine translation Neural machine translation Pseudo-translation Round-trip translation Translation Translation memory Universal translator Phraselator Mobile translation ULTRA (machine translation system) Comparison of different machine translation approaches OpenLogos Notes[edit] 1. ^ Albat, Thomas Fritz. "Systems and Methods for Automatically Estimating a Translation Time." US Patent 0185235, 19 July 2012. 2. ^ Yehoshua Bar-Hillel (1964). Language and Information: Selected Essays on Their Theory and Application. Reading, MA: Addison-Wesley. pp. 174–179. 3. ^ "Madsen, Mathias: The Limits of Machine Translation (2010)". Docs.google.com. Retrieved 2012-06-12. 4. ^ 浜口, 稔 (30 April 1993). 英仏普遍言語計画. 工作舎. pp. 70–71. ISBN 978-4-87502-214-5. "普遍的文字の構築という初期の試みに言及するときは1629年11月にデカルトがメルセンヌに宛てた手紙から始まる、というのが通り相場とな っている。しかし、この問題への関心を最初に誘発した多くの要因を吟味してみると、ある種の共通の書字という構想は明らかに、ずっと以前から比 較的なじみ深いものになっていたようである。…フランシス・ベイコンは、1605年出版の学問の進歩についてのなかで、そのような真正の文字の 体系は便利であると述べていた"translated from Knowlson, James. UNIVERSAL LANGUAGE SCHEMES IN ENGLAND AND FRANCE 1600-1800. ^ Delavenay, Émile. LA MACHINE A TRADUIRE (Collection QUE SAIS-JE? No.834). Translated by 別所照彦. Presses Universitaires de France. "英国人A.D.ブースとロックフェラー財団のワレン・ウィーバーとが同時に翻訳問題に手をつけたのは1946年のことであった。(translati on (assisted by Google translate):It was in 1946 when the English A. D. Booth and Warren Weaver at Rockefeller Foundation begun to study the issue on translation at the same time.)" ^ 上野, 俊夫 (1986-08-13). パーソナルコンピュータによる機械翻訳プログラムの制作 (in Japanese). Tokyo: (株)ラッセル社. p. 16. ISBN 494762700X. "わが国では1956年、当時の電気試験所が英和翻訳専用機「ヤマト」を実験している。この機械は1962年頃には中学1年の教科書で90点以上の能力 に達したと報告されている。(translation (assisted by Google translate): In 1959 Japan, the National Institute of Advanced Industrial Science and Technology(AIST) tested the proper English-Japanese translation machine Yamato, which reported in 1964 as that reached the power level over the score of 90-point on the textbook of 1st grade of junior hi-school.)" ^ http://museum.ipsj.or.jp/computer/dawn/0027.html ^ Nye, Mary Jo (2016). "Speaking in Tongues: Science's centuries-long hunt for a common language". Distillations. 2 (1): 40–43. Retrieved 20 March 2018. ^ Gordin, Michael D. (2015). Scientific Babel: How Science Was Done Before and After Global English. Chicago, Illinois: University of Chicago Press. ISBN 9780226000299. ^ 上野, 俊夫 (1986-08-13). パーソナルコンピュータによる機械翻訳プログラムの制作 (in Japanese). Tokyo: (株)ラッセル社. p. 16. ISBN 494762700X. ^ John Lehrberger (1988). Machine Translation: Linguistic Characteristics of MT Systems and General Methodology of Evaluation. John Benjamins Publishing. ISBN 90-272-3124-9. ^ Chitu, Alex (22 October 2007). "Google Switches to Its Own Translation System". Googlesystem.blogspot.com. Retrieved 2012-08-13. ^ "Google Translator: The Universal Language". Blog.outer-court.com. 25 January 2007. Retrieved 2012-06-12. ^ "Inside Google Translate – Google Translate". ^ http://www.mt-archive.info/10/HyTra-2013-Tambouratzis.pdf ^ Nagao, M. 1981. A Framework of a Mechanical Translation between Japanese and English by Analogy Principle, in Artificial and Human Intelligence, A. Elithorn and R. Banerji (eds.) North- Holland, pp. 173–180, 1984. ^ "the Association for Computational Linguistics – 2003 ACL Lifetime Achievement Award". Association for Computational Linguistics. Retrieved 2010-03-10. ^ http://kitt.cl.uzh.ch/clab/satzaehnlichkeit/tutorial/Unterlagen/Somers1 999.pdf ^ Adam Boretz. "Boretz, Adam, "AppTek Launches Hybrid Machine Translation Software" SpeechTechMag.com (posted 2 MAR 2009)". Speechtechmag.com. Retrieved 2012-06-12. ^ "Google's neural network learns to translate languages it hasn't been trained on". ^ "EU Spends EUR 1.9m to Customize MT for State and Regional Authorities | Slator". Slator. 2017-07-09. Retrieved 2017-07-09. ^ "KantanMT Users Can Now Customise and Deploy Neural Machine Translation Engines | Slator". Slator. 2017-03-13. Retrieved 2017-06-23. ^ "Omniscien Technologies Announces Release of Language Studio™ with Next-Generation NMT Technology | Slator". Slator. 2017-04-21. Retrieved 2017-06-23. ^ Rowe, Sam Del (2017-06-12). "SDL Adds Neural Machine Translation to Its Enterprise Translation Server". CRM Magazine. Retrieved 2017-06-23. ^ Milestones in machine translation – No.6: Bar-Hillel and the nonfeasibility of FAHQT Archived 12 March 2007 at the Wayback Machine. by John Hutchins ^ Bar-Hillel (1960), "Automatic Translation of Languages". Available online at http://www.mt-archive.info/Bar-Hillel-1960.pdf ^ Hybrid approaches to machine translation. Costa-jussà, Marta R.,, Rapp, Reinhard,, Lambert, Patrik,, Eberle, Kurt,, Banchs, Rafael E.,, Babych, Bogdan,. Switzerland. ISBN 9783319213101. OCLC 953581497. ^ Claude Piron, Le défilangues (The Language Challenge), Paris, L'Harmattan, 1994. ^ [张政.计算机语言学与机器翻译导论.外语教学与研究出版社,2010] ^ http://www.cl.cam.ac.uk/~ar283/eacl03/workshops03/W03-w1_eacl03babych.l ocal.pdf ^ Hermajakob, U., Knight, K., & Hal, D. (2008). Name Translation in Statistical Machine Translation Learning When to Transliterate. Association for Computational Linguistics. 389–397. ^ ^a ^b http://nlp.stanford.edu/courses/cs224n/2010/reports/singla-nirajuec.pdf ^ https://dowobeha.github.io/papers/amta08.pdf ^ http://homepages.inf.ed.ac.uk/mlap/Papers/acl07.pdf ^ https://www.jair.org/media/3540/live-3540-6293-jair.pdf ^ ^a ^b ^c Vossen, Piek: Ontologies. In: Mitkov, Ruslan (ed.) (2003): Handbook of Computational Linguistics, Chapter 25. Oxford: Oxford University Press. ^ Knight, Kevin. "Building a large ontology for machine translation (1993)" (PDF). Retrieved 7 September 2014. ^ "Melby, Alan. The Possibility of Language (Amsterdam:Benjamins, 1995, 27–41)". Benjamins.com. Retrieved 2012-06-12. ^ Adam (14 February 2006). "Wooten, Adam. "A Simple Model Outlining Translation Technology" T&I Business (February 14, 2006)". Tandibusiness.blogspot.com. Retrieved 2012-06-12. ^ "Appendix III of 'The present status of automatic translation of languages', Advances in Computers, vol.1 (1960), p.158-163. Reprinted in Y.Bar-Hillel: Language and information (Reading, Mass.: Addison-Wesley, 1964), p.174-179" (PDF). Retrieved 2012-06-12. ^ "Human quality machine translation solution by Ta with you" (in Spanish). Tauyou.com. 15 April 2009. Retrieved 2012-06-12. ^ "molto-project.eu". molto-project.eu. Retrieved 2012-06-12. ^ SPIEGEL ONLINE, Hamburg, Germany (13 September 2013). "Google Translate Has Ambitious Goals for Machine Translation". SPIEGEL ONLINE.CS1 maint: Multiple names: authors list (link) ^ "Machine Translation Service". 5 August 2011. ^ Google Blog: The machines do the translating (by Franz Och) ^ "Geer, David, "Statistical Translation Gains Respect", pp. 18 – 21, IEEE Computer, October 2005" (PDF). Ieeexplore.ieee.org. 27 September 2011. doi:10.1109/MC.2005.353. Retrieved 2012-06-12. ^ "Ratcliff, Evan "Me Translate Pretty One Day", Wired December 2006". Wired.com. 4 January 2009. Retrieved 2012-06-12. ^ ""NIST 2006 Machine Translation Evaluation Official Results", November 1, 2006". Itl.nist.gov. Retrieved 2012-06-12. ^ "In-Q-Tel". In-Q-Tel. Archived from the original on 20 May 2016. Retrieved 12 June 2012. ^ Gallafent, Alex (26 Apr 2011). "Machine Translation for the Military". PRI's The World. PRI's The World. Retrieved 17 Sep 2013. ^ Jackson, William (9 September 2003). "GCN – Air force wants to build a universal translator". Gcn.com. Retrieved 2012-06-12. ^ http://www.nap.edu/html/alpac_lm/ARC000005.pdf ^ "Using machine translation in clinical practice". ^ ^a ^b Way, Andy; Nano Gough (20 September 2005). "Comparing Example-Based and Statistical Machine Translation". Natural Language Engineering. 11 (3): 295–309. doi:10.1017/S1351324905003888. Retrieved 2014-03-23. ^ Muegge (2006), "Fully Automatic High Quality Machine Translation of Restricted Text: A Case Study," in Translating and the computer 28. Proceedings of the twenty-eighth international conference on translating and the computer, 16–17 November 2006, London, London: Aslib. ISBN 978-0-85142-483-5. ^ "Comparison of MT systems by human evaluation, May 2008". Morphologic.hu. Archived from the original on 19 April 2012. Retrieved 12 June 2012. ^ Anderson, D.D. (1995). Machine translation as a tool in second language learning. CALICO Journal. 13(1). 68–96. ^ Hanal. (2012), "LEPOR: A Robust Evaluation Metric for Machine Translation with Augmented Factors," in Proceedings of the 24th International Conference on Computational Linguistics (COLING 2012): Posters, pages 441–450, Mumbai, India. ^ J.M. Cohen observes (p.14): "Scientific translation is the aim of an age that would reduce all activities to techniques. It is impossible however to imagine a literary-translation machine less complex than the human brain itself, with all its knowledge, reading, and discrimination." ^ See the annually performed NIST tests since 2001 and Bilingual Evaluation Understudy ^ ^a ^b Nino, Ana. "Machine Translation in Foreign Language Learning: Language Learners' and Tutors' Perceptions of Its Advantages and Disadvantages" ReCALL: the Journal of EUROCALL 21.2 (May 2009) 241–258. ^ ^a ^b Zhao, L., Kipper, K., Schuler, W., Vogler, C., & Palmer, M. (2000). A Machine Translation System from English to American Sign Language. Lecture Notes in Computer Science, 1934: 54–67. ^ "Machine Translation: No Copyright On The Result?". SEO Translator, citing Zimbabwe Independent. Retrieved 24 November 2012. Further reading[edit] Cohen, J. M. (1986), "Translation", Encyclopedia Americana, 27, pp. 12–15 Hutchins, W. John; Somers, Harold L. (1992). An Introduction to Machine Translation. London: Academic Press. ISBN 0-12-362830-X. Lewis-Kraus, Gideon, "Tower of Babble", New York Times Magazine, June 7, 2015, pp. 48–52. External links[edit] Wikiversity has learning resources about Topic:Computational linguistics The Advantages and Disadvantages of Machine Translation International Association for Machine Translation (IAMT) Machine Translation Archive by John Hutchins. An electronic repository (and bibliography) of articles, books and papers in the field of machine translation and computer-based translation technology Machine translation (computer-based translation) – Publications by John Hutchins (includes PDFs of several books on machine translation) Machine Translation and Minority Languages John Hutchins 1999 v t e Natural language processing General terms Natural language understanding Text corpus Speech corpus Stopwords Bag-of-words AI-complete n-gram (Bigram, Trigram) Text analysis Text segmentation Part-of-speech tagging Text chunking Compound term processing Collocation extraction Stemming Lemmatisation Named-entity recognition Coreference resolution Sentiment analysis Concept mining Parsing Word-sense disambiguation Ontology learning Terminology extraction Truecasing Automatic summarization Multi-document summarization Sentence extraction Text simplification Machine translation Computer-assisted Example-based Rule-based Automatic identification and data capture Speech recognition Speech synthesis Optical character recognition Natural language generation Topic model Pachinko allocation Latent Dirichlet allocation Latent semantic analysis Computer-assisted reviewing Automated essay scoring Concordancer Grammar checker Predictive text Spell checker Syntax guessing Natural language user interface Automated online assistant Chatbot Interactive fiction Question answering Voice user interface v t e Approaches to machine translation Dictionary-based Rule-based Transfer-based Statistical Example-based Interlingual Neural Hybrid v t e Emerging technologies Fields Agriculture Agricultural robot Closed ecological systems Cultured meat Genetically modified food Precision agriculture Vertical farming Architecture Arcology Building printing + Contour crafting Domed city Biomedical Artificial uterus Ampakine Brain transplant Cryonics + Cryoprotectant + Cryopreservation + Vitrification + Suspended animation De-extinction Genetic engineering + Gene therapy Head transplant Isolated brain Life extension + Strategies for Engineered Negligible Senescence Nanomedicine Nanosensors Organ printing Personalized medicine Regenerative medicine + Stem-cell therapy + Tissue engineering Robot-assisted surgery Synthetic biology + Synthetic genomics Virotherapy + Oncolytic virus Tricorder Whole genome sequencing Displays Next generation FED FLCD iMoD Laser LPD OLED OLET QD-LED SED TPD TDEL TMOS Screenless Bionic contact lens Head-mounted display Head-up display Optical head-mounted display Virtual retinal display Other Autostereoscopy Flexible display Holographic display + Computer-generated holography Multi-primary color display Ultra HD Volumetric display Electronics Electronic nose E-textiles Flexible electronics Molecular electronics Nanoelectromechanical systems Memristor Spintronics Thermal copper pillar bump Energy Production Airborne wind turbine Artificial photosynthesis Biofuels Carbon-neutral fuel Concentrated solar power Fusion power Home fuel cell Hydrogen economy Methanol economy Molten salt reactor Nantenna Photovoltaic pavement Space-based solar power Vortex engine Storage Beltway battery Compressed air energy storage Flywheel energy storage Grid energy storage Lithium–air battery Molten-salt battery Nanowire battery Research in lithium-ion batteries Silicon–air battery Thermal energy storage Ultracapacitor Other Smart grid Wireless power Information and communications Ambient intelligence + Internet of things Artificial intelligence + Applications of artificial intelligence + Progress in artificial intelligence + Machine translation + Mobile translation + Machine vision + Semantic Web + Speech recognition Atomtronics Carbon nanotube field-effect transistor Cybermethodology Fourth-generation optical discs + 3D optical data storage + Holographic data storage GPGPU Memory + CBRAM + FRAM + Millipede + MRAM + NRAM + PRAM + Racetrack memory + RRAM + SONOS Optical computing RFID + Chipless RFID Software-defined radio Three-dimensional integrated circuit Manufacturing 3D printing Claytronics Molecular assembler Utility fog Materials science Aerogel Amorphous metal Artificial muscle Conductive polymer Femtotechnology Fullerene Graphene High-temperature superconductivity High-temperature superfluidity Linear acetylenic carbon Metamaterials + Metamaterial cloaking Metal foam Multi-function structures Nanotechnology + Carbon nanotubes + Molecular nanotechnology + Nanomaterials Picotechnology Programmable matter Quantum dots Silicene Superalloy Synthetic diamond Military Antimatter weapon Caseless ammunition Directed-energy weapon + Laser + Maser + Particle-beam weapon + Sonic weapon + Coilgun + Railgun Plasma weapon Pure fusion weapon Stealth technology Vortex ring gun Neuroscience Artificial brain Brain–computer interface Electroencephalography Mind uploading + Brain-reading + Neuroinformatics Neuroprosthetics + Bionic eye + Brain implant + Exocortex + Retinal implant Quantum Quantum algorithms Quantum amplifier Quantum bus Quantum channel Quantum circuit Quantum complexity theory Quantum computing Quantum cryptography Quantum dynamics Quantum electronics Quantum error correction Quantum imaging Quantum information Quantum key distribution Quantum logic Quantum logic gates Quantum machine Quantum machine learning Quantum metamaterial Quantum metrology Quantum network Quantum neural network Quantum optics Quantum programming Quantum sensing Quantum simulator Quantum teleportation Robotics Domotics Nanorobotics Powered exoskeleton Self-reconfiguring modular robot Swarm robotics Uncrewed vehicle Space science Launch Fusion rocket Non-rocket spacelaunch + Mass driver + Orbital ring + Skyhook + Space elevator + Space fountain + Space tether Reusable launch system Propulsion Beam-powered propulsion Ion thruster Laser propulsion Plasma propulsion engine + Helicon thruster + VASIMR Nuclear pulse propulsion Solar sail Other Interstellar travel Propellant depot Laser communication in space Transport Aerial Adaptive compliant wing Backpack helicopter Delivery drone Flying car High-altitude platform Jet pack Pulse detonation engine Scramjet Spaceplane Supersonic transport Land Airless tire Alternative fuel vehicle + Hydrogen vehicle Driverless car Ground effect train Hyperloop Maglev train Personal rapid transit Transit Elevated Bus Vactrain Vehicular communication systems Pipeline Pneumatic transport + Automated vacuum collection Other Anti-gravity Cloak of invisibility Digital scent technology Force field + Plasma window Immersive virtual reality Magnetic refrigeration Phased-array optics Topics Collingridge dilemma Differential technological development Disruptive Innovation Ephemeralization Exploratory engineering Fictional technology Proactionary principle Technological change + Technological unemployment Technological convergence Technological evolution Technological paradigm Technology forecasting + Accelerating change + Moore's law + Technological singularity + Technology scouting Technology readiness level Technology roadmap Transhumanism Category Category List-Class article List Authority control Edit this at Wikidata GND: 4003966-3 NDL: 00565743 Retrieved from "https://en.wikipedia.org/w/index.php?titleMachine_translation&oldid8 74635483" Categories: Artificial intelligence applications Computational linguistics Machine translation Computer-assisted translation Tasks of natural language processing Hidden categories: CS1 Japanese-language sources (ja) Webarchive template wayback links CS1 Spanish-language sources (es) CS1 maint: Multiple names: authors list Use dmy dates from July 2014 Articles needing additional references from June 2008 All articles needing additional references All articles with failed verification Articles with failed verification from December 2017 All articles with unsourced statements Articles with unsourced statements from December 2018 Articles with unsourced statements from February 2007 Wikipedia articles with GND identifiers Wikipedia articles with NDL identifiers Navigation menu alternate alternate alternate alternate alternate alternate alternate Live Chat IFRAME: //www.googletagmanager.com/ns.html?idGTM-NPXVFB What is Machine Translation? The translation of text by a computer, with no human involvement Learn about SDL Trados Studio Learn about SDL Language Cloud Increase productivity and translate faster Using machine translation as part of the SDL Trados Studio environment, you can translate more content and deliver it faster than before. SDL Trados Studio software includes support for several machine translation engines. What is machine translation? The translation of text by a computer, with no human involvement. Pioneered in the 1950s, machine translation can also be referred to as automated translation, automatic or instant translation. How does machine translation work? There are three types of machine translation system: rules-based, statistical and neural: Rules-based systems use a combination of language and grammar rules plus dictionaries for common words. Specialist dictionaries are created to focus on certain industries or disciplines. Rules-based systems typically deliver consistent translations with accurate terminology when trained with specialist dictionaries. Statistical systems have no knowledge of language rules. Instead they "learn" to translate by analysing large amounts of data for each language pair. They can be trained for specific industries or disciplines using additional data relevant to the sector needed. Typically statistical systems deliver more fluent-sounding but less consistent translations. Neural Machine Translation (NMT) is a new approach that makes machines learn to translate through one large neural network (multiple processing devices modeled on the brain). The approach has become increasingly popular amongst MT researchers and developers, as trained NMT systems have begun to show better translation performance in many language pairs compared to the phrase-based statistical approach. When would I use machine translation? When translating with SDL Trados Studio, any segments not leveraged from translation memory can automatically be machine translated for a translator to review, then accept and amend if necessary, or decide to manually translate instead. A translator can configure which machine translation to use and how much is used. Respecting client confidentiality If the projects you work on are commercially sensitive, your customer may require that information is not disclosed to any third parties. Carefully consider how and when to use machine translation as you could be sharing segments of the source with a third party. Audit files are automatically generated by SDL Trados Studio which record the use of machine translation. What machine translation can I use? SDL Trados Studio supports a number of machine translation engines that are available over an internet connection including SDL Language Cloud which is provided by SDL. You can choose between various options starting with a free package of baseline (untrained) translation up to industry specific trained engines, we have an option that is right for you. You can use AdaptiveMT, our self-learning machine translation from within SDL Trados Studio. AdaptiveMT works via SDL Language Cloud MT and learns from your edits in real-time as you translate. What are the benefits to translators? Increased productivity – deliver translations faster Pre-translate new segments that are not leveraged from translation memory. Connect to and use a customer's or supplier's trained engine through the industry specific solution SDL Language Cloud to achieve better quality results every time. Flexibility and choice – to suit all types of project Select from a number of different machine translation engines. Choose from over 100 languages and more than 2,500 language pairs to suit your project. The option to compare the results of rules-based and statistical machine translation engines. Discover AdaptiveMT SDL AdaptiveMT is your own personal machine translation engine that adapts as you translate. Accessed directly within SDL Trados Studio, AdaptiveMT will learn from your post-edits in real-time to retain your style, tone and content, saving time and minimizing future post-editing. Learn more Learn more Discover SDL Language Cloud Access secure industry engines or add your own terminology for high-quality output in SDL Trados Studio. Learn more Learn more Are you looking for a way to not only improve the quality of your work but speed up the process too? SDL Language Cloud has machine translation that can be personalised with your own terms. By offering both machine and human translation, Language Cloud combines your personal term dictionary with our industry specific, self-learning engines to result in high level translations. SDL AdaptiveMT learns from you as you translate and adapts to your translation style, content and terminology applying it in real time through SDL Trados Studio. The influence from your previous translations prevents you from having to mend the same issues repeatedly and s you to save time and money by minimising the amount of possible post-editing. Watch the two-minute introductory video. alternate alternate alternate alternate alternate alternate alternate alternate alternate What is Machine Translation? Rule Based Machine Translation vs. Statistical Machine Translation Machine translation (MT) is automated translation. It is the process by which computer software is used to translate a text from one natural language (such as English) to another (such as Spanish). To process any translation, human or automated, the meaning of a text in the original (source) language must be fully restored in the target language, i.e. the translation. While on the surface this seems straightforward, it is far more complex. Translation is not a mere word-for-word substitution. A translator must interpret and analyze all of the elements in the text and know how each word may influence another. This requires extensive expertise in grammar, syntax (sentence structure), semantics (meanings), etc., in the source and target languages, as well as familiarity with each local region. Human and machine translation each have their share of challenges. For example, no two individual translators can produce identical translations of the same text in the same language pair, and it may take several rounds of revisions to meet customer satisfaction. But the greater challenge lies in how machine translation can produce publishable quality translations. Rule-Based Machine Translation Technology Rule-based machine translation relies on countless built-in linguistic rules and millions of bilingual dictionaries for each language pair. The software parses text and creates a transitional representation from which the text in the target language is generated. This process requires extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules. The software uses these complex rule sets and then transfers the grammatical structure of the source language into the target language. Translations are built on gigantic dictionaries and sophisticated linguistic rules. Users can improve the out-of-the-box translation quality by adding their terminology into the translation process. They create user-defined dictionaries which override the system’s default settings. In most cases, there are two steps: an initial investment that significantly increases the quality at a limited cost, and an ongoing investment to increase quality incrementally. While rule-based MT brings companies to the quality threshold and beyond, the quality improvement process may be long and expensive. Statistical Machine Translation Technology Statistical machine translation utilizes statistical translation models whose parameters stem from the analysis of monolingual and bilingual corpora. Building statistical translation models is a quick process, but the technology relies heavily on existing multilingual corpora. A minimum of 2 million words for a specific domain and even more for general language are required. Theoretically it is possible to reach the quality threshold but most companies do not have such large amounts of existing multilingual corpora to build the necessary translation models. Additionally, statistical machine translation is CPU intensive and requires an extensive hardware configuration to run translation models for average performance levels. Rule-Based MT vs. Statistical MT Rule-based MT provides good out-of-domain quality and is by nature predictable. Dictionary-based customization guarantees improved quality and compliance with corporate terminology. But translation results may lack the fluency readers expect. In terms of investment, the customization cycle needed to reach the quality threshold can be long and costly. The performance is high even on standard hardware. Statistical MT provides good quality when large and qualified corpora are available. The translation is fluent, meaning it reads well and therefore meets user expectations. However, the translation is neither predictable nor consistent. Training from good corpora is automated and cheaper. But training on general language corpora, meaning text other than the specified domain, is poor. Furthermore, statistical MT requires significant hardware to build and manage large translation models. Rule-Based MT Statistical MT + Consistent and predictable quality – Unpredictable translation quality + Out-of-domain translation quality – Poor out-of-domain quality + Knows grammatical rules – Does not know grammar + High performance and robustness – High CPU and disk space requirements + Consistency between versions – Inconsistency between versions – Lack of fluency + Good fluency – Hard to handle exceptions to rules + Good for catching exceptions to rules – High development and customization costs + Rapid and cost-effective development costs provided the required corpus exists Given the overall requirements, there is a clear need for a third approach through which users would reach better translation quality and high performance (similar to rule-based MT), with less investment (similar to statistical MT). Technology Why Use Language Translation Software? Benefits of Language Translation Software What Is Machine Translation? Pure Neural™ Machine Translation: SYSTRAN's innovative neural engine SYSTRAN: 50 Years of MT Innovation Advantages of SYSTRAN Technology SYSTRAN Customization Methodology About SYSTRAN Company information Partners Investors News Careers IFRAME: //www.googletagmanager.com/ns.html?idGTM-MTPGF3 [Language-Technology-3.png?itokTrh-N2ol] What is Machine Translation? Machine translation (MT) refers to fully automated software that can translate source content into target languages. Humans may use MT to them render text and speech into another language, or the MT software may operate without human intervention. MT tools are often used to translate vast amounts of information involving millions of words that could not possibly be translated the traditional way. The quality of MT output can vary considerably; MT systems require “training” in the desired domain and language pair to increase quality. Translation companies use MT to augment productivity of their translators, cut costs, and provide post-editing services to clients. MT use by language service providers is growing quickly. In 2016, SDL—one of the largest translation companies in the world—announced it translates 20 times more content with MT than with human teams. MT Systems Generic MT usually refers to platforms such as Google Translate, Bing, Yandex, and Naver. These platforms provide MT for ad hoc translations to millions of people. Companies can buy generic MT for batch pre-translation and connect to their own systems via API. Customizable MT refers to MT software that has a basic component and can be trained to improve terminology accuracy in a chosen domain (medical, legal, IP, or a company’s own preferred terminology). For example, WIPO’s specialist MT engine translates patents more accurately than generalist MT engines, and eBay’s solution can understand and render into other languages hundreds of abbreviations used in electronic commerce. Adaptive MT offers suggestions to translators as they type in their CAT-tool, and learns from their input continuously in real time. Introduced by Lilt in 2016 and by SDL in 2017, adaptive MT is believed to improve translator productivity significantly and can challenge translation memory technology in the future. There are over 100 providers of MT technologies. Some of them are strictly MT developers, others are translation firms and IT giants. Examples of MT Providers Google Translate Microsoft Translator / Bing SDL BeGlobal Yandex Translate Amazon Web Services translator Naver IBM - Watson Language Translator Automatic Trans BABYLON CCID TransTech Co. CSLi East Linden Eleka Ingeniaritza Linguistikoa GrammarSoft ApS Iconic Translation Machines K2E-PAT KantanMT Kodensha Language Engineering Company Lighthouse IP Group Lingenio Lingosail Technology Co. LionBridge Lucy Software / ULG MorphoLogic / Globalese Multilizer NICT Omniscien Pangeanic Precision Translation Tools (Slate) Prompsit Language Engineering PROMT Raytheon Reverso Softissimo SkyCode Smart Communications Sovee SyNTHEMA SYSTRAN tauyou Tilde Trident Software UTH International Worldlingo Based on a TAUS Report MT Approaches There are three main approaches to machine translation: First-generation rule-based (RbMT) systems rely on countless algorithms based on the grammar, syntax, and phraseology of a language. Statistical systems (SMT) arrived with search and big data. With lots of parallel texts becoming available, SMT developers learned to pattern-match reference texts to find translations that are statistically most likely to be suitable. These systems train faster than RbMT, provided there is enough existing language material to reference. Neural MT (NMT) uses machine learning technology to teach software how to produce the best result. This process consumes large amounts of processing power, and that is why it’s often run on graphics units of CPUs. NMT started gaining visibility in 2016. Many MT providers are now switching to this technology. A combination of two different MT methods is called Hybrid MT. Availability: API, Cloud, Server, Desktop Google, Microsoft, IBM, Amazon, Yandex, and many others run MT software on their own infrastructure and provide it as a Cloud API service, priced per symbol. For example, it costs $20 to translate 1 million characters with Google Translate. In contrast, developers of customizable MT, including Systran and Promt, offer server and desktop products priced per license. In professional translations, MT is most often integrated into the CAT-tool. The human linguist can pick a suggestion from MT as they go through the text, if they don’t find a better match in the translation memory. Build Your Own MT Engine There are open-source toolkits anyone can use to build their own engines for any domain and language combination. The most popular baseline software are: Moses for SMT, OpenNMT for Neural and Apertium for RBT. Training statistical and neural engines requires a large collection of parallel texts in two languages. Some organizations such as TAUS have made a service out of providing baseline data, which companies can further expand by adding their own specialist translations. Evaluating MT Quality Translation companies and departments typically evaluate MT quality by the effort it takes for a human to post-edit the output. It is often measured in pages per hour, or in the number of key strokes per segment. Specialists training MT engines rely on automated tests and metrics. They are better suited for A/B testing and experimentation and show the impact of the tiniest changes, where humans might not notice the difference. The mainstay metric for auto-testing is called BLEU. “Bilingual evaluation understudy (BLEU)” shows how closely MT translation corresponds to human translation of the same text. It compares parallel translations and produces a score between 0 (worst) and 1 (best). While BLEU scores are widely used by MT researchers, they can be manipulated, and it takes a specialist to make sense of results. Other MT quality metrics include METEOR, ROUGE, HyTER, and NIST. Quality metrics are the focus of the QT21 program supported by GALA. Ethics for Translation Providers using MT Confidentiality - Content translated by free MT platforms such as Google Translate and Microsoft Translator is not confidential. It is stored by the platform owners and may be reused for later translations. Notifying the Client about MT Use - It’s a point of debate in the industry if a translation company should notify clients about use of MT on their projects. Many pundits are in favor of informing the customer of MT usage and others may not disclose the use of MT. Be sure to ask your provider if you have questions about MT usage. Read More Translation Technology Descriptions TAUS Machine Translation Market Report 2017 GALA Members iLen Technology logo RoundTable Studio logo Morningside Translations logo CETRA Tilde logo Spil Games Logo Middlebury Institute of International Studies at Monterey AUM Translation Services logo View all members About GALA The Globalization and Localization Association (GALA) is a global, non-profit trade association for the language industry. As a membership organization, we support our member companies and the language sector by creating communities, championing standards, sharing knowledge, and advancing technology. LEARN MORE Join GALA GALA membership delivers measurable value to companies and their employees through access to professional development and expert resources, participation in industry initiatives, opportunities for visibility, connection with a network of industry peers, and discounts on events, software, and programs. Join today What is Machine Translation? Machine translation (MT) refers to fully automated software that can translate source content into target languages. Humans may use MT to them render text and speech into another language, or the MT software may operate without human intervention. MT tools are often used to translate vast amounts of information involving millions of words that could not possibly be translated the traditional way. The quality of MT output can vary considerably; MT systems require “training” in the desired domain and language pair to increase quality. Translation companies use MT to augment productivity of their translators, cut costs, and provide post-editing services to clients. MT use by language service providers is growing quickly. In 2016, SDL—one of the largest translation companies in the world—announced it translates 20 times more content with MT than with human teams. MT Systems Generic MT usually refers to platforms such as Google Translate, Bing, Yandex, and Naver. These platforms provide MT for ad hoc translations to millions of people. Companies can buy generic MT for batch pre-translation and connect to their own systems via API. Customizable MT refers to MT software that has a basic component and can be trained to improve terminology accuracy in a chosen domain (medical, legal, IP, or a company’s own preferred terminology). For example, WIPO’s specialist MT engine translates patents more accurately than generalist MT engines, and eBay’s solution can understand and render into other languages hundreds of abbreviations used in electronic commerce. Adaptive MT offers suggestions to translators as they type in their CAT-tool, and learns from their input continuously in real time. Introduced by Lilt in 2016 and by SDL in 2017, adaptive MT is believed to improve translator productivity significantly and can challenge translation memory technology in the future. There are over 100 providers of MT technologies. Some of them are strictly MT developers, others are translation firms and IT giants. Examples of MT Providers Google Translate Microsoft Translator / Bing SDL BeGlobal Yandex Translate Amazon Web Services translator Naver IBM - Watson Language Translator Automatic Trans BABYLON CCID TransTech Co. CSLi East Linden Eleka Ingeniaritza Linguistikoa GrammarSoft ApS Iconic Translation Machines K2E-PAT KantanMT Kodensha Language Engineering Company Lighthouse IP Group Lingenio Lingosail Technology Co. LionBridge Lucy Software / ULG MorphoLogic / Globalese Multilizer NICT Omniscien Pangeanic Precision Translation Tools (Slate) Prompsit Language Engineering PROMT Raytheon Reverso Softissimo SkyCode Smart Communications Sovee SyNTHEMA SYSTRAN tauyou Tilde Trident Software UTH International Worldlingo Based on a TAUS Report MT Approaches There are three main approaches to machine translation: First-generation rule-based (RbMT) systems rely on countless algorithms based on the grammar, syntax, and phraseology of a language. Statistical systems (SMT) arrived with search and big data. With lots of parallel texts becoming available, SMT developers learned to pattern-match reference texts to find translations that are statistically most likely to be suitable. These systems train faster than RbMT, provided there is enough existing language material to reference. Neural MT (NMT) uses machine learning technology to teach software how to produce the best result. This process consumes large amounts of processing power, and that is why it’s often run on graphics units of CPUs. NMT started gaining visibility in 2016. Many MT providers are now switching to this technology. A combination of two different MT methods is called Hybrid MT. Availability: API, Cloud, Server, Desktop Google, Microsoft, IBM, Amazon, Yandex, and many others run MT software on their own infrastructure and provide it as a Cloud API service, priced per symbol. For example, it costs $20 to translate 1 million characters with Google Translate. In contrast, developers of customizable MT, including Systran and Promt, offer server and desktop products priced per license. In professional translations, MT is most often integrated into the CAT-tool. The human linguist can pick a suggestion from MT as they go through the text, if they don’t find a better match in the translation memory. Build Your Own MT Engine There are open-source toolkits anyone can use to build their own engines for any domain and language combination. The most popular baseline software are: Moses for SMT, OpenNMT for Neural and Apertium for RBT. Training statistical and neural engines requires a large collection of parallel texts in two languages. Some organizations such as TAUS have made a service out of providing baseline data, which companies can further expand by adding their own specialist translations. Evaluating MT Quality Translation companies and departments typically evaluate MT quality by the effort it takes for a human to post-edit the output. It is often measured in pages per hour, or in the number of key strokes per segment. Specialists training MT engines rely on automated tests and metrics. They are better suited for A/B testing and experimentation and show the impact of the tiniest changes, where humans might not notice the difference. The mainstay metric for auto-testing is called BLEU. “Bilingual evaluation understudy (BLEU)” shows how closely MT translation corresponds to human translation of the same text. It compares parallel translations and produces a score between 0 (worst) and 1 (best). While BLEU scores are widely used by MT researchers, they can be manipulated, and it takes a specialist to make sense of results. Other MT quality metrics include METEOR, ROUGE, HyTER, and NIST. Quality metrics are the focus of the QT21 program supported by GALA. Ethics for Translation Providers using MT Confidentiality - Content translated by free MT platforms such as Google Translate and Microsoft Translator is not confidential. It is stored by the platform owners and may be reused for later translations. Notifying the Client about MT Use - It’s a point of debate in the industry if a translation company should notify clients about use of MT on their projects. Many pundits are in favor of informing the customer of MT usage and others may not disclose the use of MT. Be sure to ask your provider if you have questions about MT usage. Read More Translation Technology Descriptions TAUS Machine Translation Market Report 2017 Logo Springer ____________________ (BUTTON) Search Options Advanced Search Search Search Menu » Sign up / Log in English Deutsch Academic edition Corporate edition Skip to: Main content Side column Home Books A - Z Journals A - Z Videos Librarians Browse Volumes & Issues ____________________ Machine Translation ISSN: 0922-6567 (Print) 1573-0573 (Online) This journal was previously published under other titles (view Journal History) Description Covers all branches of computational linguistics and language engineering, wherever they incorporate a multilingual aspect. It features papers that cover the theoretical, descriptive or computational aspects of any of the following topics: - compilation and use of bi- and multilingual corpora - computer-aided language instruction and learning - computational implications of non-Roman character sets - connectionist approaches to translation - contrastive linguistics - corpus-based and statistical language modeling - discourse phenomena and their treatment in (human or machine) translation - history of machine translation - human translation theory and practice - knowledge engineering - machine translation and machine-aided translation - minority languages - morphology, syntax, semantics, pragmatics - multilingual dialogue systems - multilingual information retrieval - multilingual information society (sociological and legal as well as linguistic aspects) - multilingual message understanding systems - multilingual natural language interfaces - multilingual text composition and generation - multilingual word-processing - phonetics, phonology - software localization and internationalization - speech processing, especially for speech translation Your article in Machine Translation? online via http://www.editorialmanager.com/coat/ show all hide Browse Volumes & Issues Latest Articles 1. No Access OriginalPaper A user-study on online adaptation of neural machine translation to human post-edits Sariya Karimova, Patrick Simianer, Stefan Riezler (December 2018) 2. No Access OriginalPaper Automatic quality estimation for speech translation using joint ASR and MT features Ngoc-Tien Le, Benjamin Lecouteux, Laurent Besacier (December 2018) 3. No Access OriginalPaper Reassessing the proper place of man and machine in translation: a pre-translation scenario Julia Ive, Aurélien Max, François Yvon (December 2018) See all articles Machine Translation Available 1986 - 2018 Volumes 32 Issues 104 Articles 576 Open Access 8 Articles Stay up to Date Article abstracts by RSS for journal updates Find a Volume or Issue Volume ____________________ Please enter a valid issue and/or volume. Issue ____________________ Please enter a valid issue for volume. Find Share Share this content on Share this content on Share this content on LinkedIn About this Journal Journal Title Machine Translation Coverage Volume 1 / 1986 - Volume 32 / 2018 Print ISSN 0922-6567 Online ISSN 1573-0573 Publisher Springer Netherlands Additional Links + for Journal Updates + Editorial Board + About This Journal + Manuscript Submission Topics + Natural Language Processing (NLP) + Computational Linguistics + Artificial Intelligence Industry Sectors + Electronics + Telecommunications + IT & Software Journal History Previous Title Print ISSN Online ISSN Computers and translation 0884-0709 1573-0573 _____________________________________________ memoQ translator pro The CAT tool for individual translators memoQ project manager Translation project management made simple! memoQ translator free Free version of memoQ with limited features Teamwork ______________________________________________________________ memoQ server For enterprises and translation companies memoQ cloud Advanced translation productivity in the cloud! QTerm The ultimate terminology management system Other solutions ______________________________________________________________ ________________________________________ The memoQ story What's the story behind memoQ? It all started in Hungary in 2004 with three language technologists Leadership Team Meet our executive team, a group of passionate people leading the digital transformation in translation technology Events & Partnership ______________________________________________________________ Events Our team travels around the world to meet with you. See where we are heading next! Academic Program A dynamic network of more than 250 universities spread over 5 continents Association Program We work closely with translation associations all over the world Contact us News Menu memoq logo What is Machine Translation? Machine translation (MT) or automated translation is the process by which computer software translates text from one language (such as English) to another (such as Spanish), with no human involvement. People use machine translation from Google and Microsoft for a wide number of situations and it seems to work well. You can certainly take advantage of machine translation, but you need to be careful not to jeopardize the overall quality of your translation project. How does machine translation work? Machine translation works by large amounts of source and target language content being matched by a machine translation engine. There are different types of machine translation engines: rule-based, statistical and neural. Recently, there has been a lot of interest in neural machine translation engines. The reason for the excitement is that neural machine translation is providing better results with language pairs where there is less data and the output reads much better. Content Privacy If your project has confidential material, you might want to avoid using some of the very popular engines. These technologies split each sentence into smaller segments and it could be difficult, if not impossible, to recreate the original. However, it may still be possible for someone to find this information online. Involving human translators and reviewers is always necessary All machine translation engines make mistakes. The translation you get from the machine translation engine might be literally correct, but the tone, wording or can be incorrect. If you just use machine translation without the supervision from translators or reviewers, you might get the phrase “Yo Dude!” being translated as if it were “Hello Sir”. Clearly, you want to avoid this. MT plugins in memoQ This page introduces the various machine translation engines which offer memoQ plugins. Read more memoQ translator pro memoQ translator pro is a computer-assisted translation tool which runs on Microsoft Windows operating systems. Download a free 30-day trial version! Try it for free! Contact us! If you think that you could benefit from a CAT tool but would like to learn more about it contact us or give us a call! Send us a message Machine translation in memoQ memoQ has integrations with 13 of the most popular machine translation engines. When translating, translators can see suggestions coming from the machine translation engines and use them if they feel they are applicable. This provides a good way to benefit from machine translation as the translator will ensure that the content of the localized version has the same style and feel as the original. When venturing into machine translation, you need to know that choosing an engine is not a simple task. You should think of different factors such as language pairs, content subject, among others. Check this short video on Machine Translation in memoQ IFRAME: //www.youtube.com/embed/aTfwLLU265k?fs1&enablejsapi1&version3 Cross language barriers instantly with Tilde custom machine translation Customize machine translation systems for your language, your terminology, and your style. Use Tilde MT nowTry Tilde Translator chart boost translation productivity Boost translation productivity The use of MT has been proven to professional translators work 35% faster, raising efficiency. machine translation reach global audience Reach global audiences Integrating MT into your online platform allows audiences to read content in their native language. machine translation access multilingual information Access multilingual information MT enables organizations to analyze and access information from all over the world, in any language. How can Tilde custom machine translation you? For Localization Service Providers The most innovative LSPs are turning to machine translation to them meet the growing demand for localization. MT can not only boost productivity, but also LSPs reduce costs and drive revenue. Start today! Learn More machine translation for localization service providers machine translation for business and enterprises For Enterprise Users Machine translation s global business reach across language barriers to address consumers in all markets. Customers want to see information in their own native language. MT is the key to reaching global clients. Learn More For Public Administrations Public administration use machine translation to enable access to e-services for all citizens and residents. Tilde is a recognized leader in developing MT services for EU government and organizations. Learn more machine translation for public administration FEATURES Why Tilde MT? Check out our special features machine translation terminology integration Terminology integration Boost your system's accuracy with terminology integration. This ensures that industry-specific terms are translated correctly and consistently. machine translation document translation Full document translation Translate entire documents with the click of a button. Simply upload or drag/drop a document and a full translation is provides in seconds. It’s that easy. machine translation data library Data library Don't have enough data? No problem. We can draw from our huge multilingual Data Library to improve your MT system's capabilities. machine translation tilde lets mt technology Neural machine translation Neural machine translation produces more fluent, humanlike translations, substantially boosting the level of MT quality and accuracy. Show all features testimonials What our clients are saying Tilde MT is among a "new breed of hosted MT providers" that has successfully "simplified access for small language companies [to MT] and enabled them to use it. Many have flocked to it since that time. Common Sense Advisory testimonial for Tilde Common Sense Advisory The site translated 2,750 stories last year, but it is working on making the translation process more efficient. One way it’s working to do that is through machine translations (..) with a Latvian company, Tilde. EurActiv (..) hopes the new technology will make the translation process three times faster. Harvard University testimonial for Tilde Nieman Journalism Lab Harvard University Tilde has developed the machine translation tool Hugo.lv, which has considerably improved the availability of e-government services of Latvia to the customers from Latvian, English and Russian language communities in Europe and the whole world. Minister of Foreign Affairs testimonial for Tilde Edgars Rinkevics Minister of Foreign Affairs, Republic of Latvia Machine Translation by M. Kay At the end of the 1950s, researchers in the United States, Russia, and Western Europe were confident that high-quality machine translation (MT) of scientific and technical documents would be possible within a very few years. After the promise had remained unrealized for a decade, the National Academy of Sciences of the United States published the much cited but little read report of its Automatic Language Processing Advisory Committee. The ALPAC Report recommended that the resources that were being expended on MT as a solution to immediate practical problems should be redirected towards more fundamental questions of language processing that would have to be answered before any translation machine could be built. The number of laboratories working in the field was sharply reduced all over the world, and few of them were able to obtain funding for more long-range research programs in what then came to be known as computational linguistics. There was a resurgence of interest in machine translation in the 1980s and, although the approaches adopted differed little from those of the 1960s, many of the efforts, notably in Japan, were rapidly deemed successful. This seems to have had less to do with advances in linguistics and software technology or with the greater size and speed of computers than with a better appreciation of special situations where ingenuity might make a limited success of rudimentary MT. The most conspicuous example was the METEO system, developed at the University of Montreal, which has long provided the French translations of the weather reports used by airlines, shipping companies, and others. Some manufacturers of machinery have found it possible to translate maintenance manuals used within their organizations (not by their customers) largely automatically by having the technical writers use only certain words and only in carefully prescribed ways. Why Machine Translation Is Hard Many factors contribute to the difficulty of machine translation, including words with multiple meanings, sentences with multiple grammatical structures, uncertainty about what a pronoun refers to, and other problems of grammar. But two common misunderstandings make translation seem altogether simpler than it is. First, translation is not primarily a linguistic operation, and second, translation is not an operation that preserves meaning. There is a famous old example that makes the first point well. Consider the sentence: The police refused the students a permit because they feared violence. Suppose that it is to be translated into a language like French in which the word for 'police' is feminine. Presumably the pronoun that translates 'they' will also have to be feminine. Now replace the word 'feared' with 'advocated'. Now, suddenly, it seems that 'they' refers to the students and not to the police and, if the word for students is masculine, it will therefore require a different translation. The knowledge required to reach these conclusions has nothing linguistic about it. It has to do with everyday facts about students, police, violence, and the kinds of relationships we have seen these things enter into. The second point is, of course, closely related. Consider the following question, stated in French: Ou voulez-vous que je me mette? It means literally, "Where do you want me to put myself?" but it is a very natural translation for a whole family of English questions of the form "Where do you want me to sit/stand/sign my name/park/tie up my boat?" In most situations, the English "Where do you want me?" would be acceptable, but it is natural and routine to add or delete information in order to produce a fluent translation. Sometimes it cannot be avoided because there are languages like French in which pronouns must show number and gender, Japanese where pronouns are often omitted altogether, Russian where there are no articles, Chinese where nouns do not differentiate singular and plural nor verbs present and past, and German where flexibility of the word order can leave uncertainties about what is the subject and what is the object. The Structure of Machine Translation Systems While there have been many variants, most MT systems, and certainly those that have found practical application, have parts that can be named for the chapters in a linguistic text book. They have lexical, morphological, syntactic, and possibly semantic components, one for each of the two languages, for treating basic words, complex words, sentences and meanings. Each feeds into the next until a very abstract representation of the sentence is produced by the last one in the chain. There is also a 'transfer' component, the only one that is specialized for a particular pair of languages, which converts the most abstract source representation that can be achieved into a corresponding abstract target representation. The target sentence is produced from this essentially by reversing the analysis process. Some systems make use of a so-called 'interlingua' or intermediate language, in which case the transfer stage is divided into two steps, one translating a source sentence into the interlingua and the other translating the result of this into an abstract representation in the target language. Frequently Asked Questions Why Major in Linguistics? How Many Languages Are There? Does Language Affect Thought? Why Is English Changing? What We Do The mission of the LSA is to advance the scientific study of language. The LSA aspires to a world in which the essential nature of language and its central role in human life is well understood. » Read more about us Sponsored Ad Join Now Become a LSA Member Members Here Linguistic Society of America : Advancing the Scientific Study of Language Machine translation with post-editing Benefitting from machine translation Immediate insight into texts in foreign languages regardless of volume and format Professional post-editing: manage the quality of automated translation depending on your requirements Translation delivery time and cost reduction of 30-50% Our approach to machine translation MT solutions customized for particular company or industry MT post-editing services at a competitive rate Adjustable delivery models: from SaaS to on-premise Integration of automated translation into corporate content management systems and workflows Confidential translations Delivering efficient MT solutions Our expertise includes working with three types of machine translation systems: Statistical (SMT) engines analyze the source language based on existing lexical resources, such as good quality translation memories and terminology databases, to select the most appropriate translations in the target language. Providing high quality data is paramount to ensure desired MT output. Rule-based (RBMT) engines analyze grammar in each segment and then invoke specific rules to derive target from source for a given language pair. These engines also analyze syntactic structure of the source sentence trying to adapt it to common target language patterns. Model-based (MBMT) engine that uses full semantic and syntactic analysis of source text prior to translation. This MT system is based on the patented ABBYY® Compreno® technology. It transforms strings of characters into data that “makes sense” to a computer and then generates the translation based on the meaning of the source text. We go several steps further by combining controlled language, translation memory and MT systems all of which are enhanced by ABBYY's own semantic, morphological and lexical analysis tools. This customization based on individual client's data ensures maximum quality of the output before it is sent to post-editing. Specific scenario (raw or post-edited MT) is always up to the client. In any case our job is to deliver MT solution that saves both time and money… and keeps operational overhead low! Related solutions: SmartCAT ABBYY Legal Translation A lot of people have been talking about machine translation in customer service. It would make replying to tickets more efficient, improve the customer experience, and even companies expand to different countries without having to hire native agents. However, most customer service leaders are skeptical about introducing this technology into their workflows—and for good reason. For most people, Google Translate is the first thing that comes to mind upon mention of machine translation (MT). But would you really trust it to translate everything you need to tell your customers? Probably not — considering mistakes like mistranslating the name of a Spanish food festival as “clitoris festival,” or identifying the phrase “Ooga Booga Wooga” as Somali. Nonetheless, this doesn’t make machine translation irrelevant to customer service. And here’s why. Even if you have the best customer support agents at your disposal, their ability to serve clients has one obvious limitation: language. So, what are your options if you need to provide customer support in markets with different languages? You can hire a bunch of native agents and train them (which is costly and time-consuming). Or you can automate translation (reducing costs and making your team more efficient). Imagine if your French speaking agents could seamlessly communicate with Chinese customers in their native language (in this case, Mandarin). Wouldn’t it be great? Or, if you could distribute multilingual customer support tickets equally among team members, regardless of what languages they speak, during peak season? Wouldn’t it be the holy grail of operational efficiency? The answer is yes, of course, it would. But there’s one significant detail that prevents most customer service managers from automating translation, and that is quality. Machine translation quality — we’re gonna have to earn it CS operational managers (as well as most people) perceive translation quality from a single point of view: it must be perfect. On the other hand, in rapidly expanding businesses, language is nothing more than just a tool and its quality should be fit for purpose. So how do you make sure you have the highest quality translations without having to hire an international community of customer support agents to rival the Eurovision lineup? Well, that’s what we’re working on at Unbabel. We combine the best of machine translation with a community of tens of thousands of bilingual editors who review and approve the translations. And part of the reason why we involve humans in the process is because machine translation alone can’t yet deliver the quality we need. For machine translation to work, we need human translations to feed into the systems and train them. Once the system receives all the data, it starts to learn patterns and to produce better translations. But what if humans weren’t involved in this process? Would machine translation be enough for customer service? I doubt it. And let me tell you why. At Unbabel, we have translated crazy amounts of customer support messages for companies like Booking.com, EasyJet, Under Armour, and King, and if there’s one thing we know, it’s that machines make mistakes (some of which are not so easy to spot). Below are some of the most common mistakes made by machines in translations of customer service messages — mistakes that our community of editors have spotted and corrected. 1. Corrupted meaning — it’s a free-for-all No company likes to give things away for free. Needless to say it’d be bad for business if your translations left customers thinking that you do. Here’s an example of an actual translation which had to be reviewed and edited by our bilingual editors: Source (English): You recently notified us of the possibility that copyrighted material was being made available through our website. Machine translation (German): Sie haben uns vor Kurzem von der Überzeugung in Kenntnis gesetzt, dass urheberrechtlich geschütztes Material auf unserer Website kostenlos verfügbar ist. [You recently notified us of a belief that copyrighted material was being made available at no cost through our website.] The problem with this is that the word “available” was translated into German as “available at no cost”. 2. MT peculiarities — where am I? Some travelers learn to love the unexpected. But nobody wants to end up stranded in the wrong city because of a translation error. Source (Russian): Наш хостел расположен в деревне Туришкино, которая находится в 60 км от Санкт-Петербурга. [Our hostel is located in the village of Turishkino, which is 60 km away from St.Peterersburg] Machine translation (English): Our hostel is located in village Tururushkaino, which is 60 km away from St.Peterersburg. Since the neural machine translation system did not have the name of the village “Туришкино” in its lexical bank (to be fair, it’s a pretty rare word), it had to translate it into something else. Wrong translation, wrong city. This may also happen when you convert units of length: Source (English): If you live just 20 kilometres away from San Diego, you may consider driving to the Westfield Mission Valley mall and collecting it yourself. Machine translation (French): Si vous habitezseulement 20 milles de San Diego, vous pouvez envisager de vous rendre au centre commercial Westfield Mission Valleyde le récupérer vous-même. 3. MT hallucinations — the ghost of texts past Sometimes machines see things that aren’t actually there, haunted as they are by the memory of translations in their database. We like to call this phenomenon MT hallucinations. For instance, the machine may add unnecessary words to the translation, as in the example below: Source (English): The contract is understandable. Machine translation (French): Le contratcompréhensible, veuillez nous appeler dès que possible. [The contract is understandable, please call us as soon as possible.] In this case, what happened was that the MT system referred to previous translation examples and generated an extra clause which did not appear in the source text: “please call us as soon as possible”. But the MT system can also do the opposite and erase parts of the message: Source (English): It looks like it took a while for the subscription to be marked inactive but it is cancelled now. Machine translation (German): Es scheint, dass es eine Weile gedauert hat, bis das Abonnement als inaktiv markiert wurde. [It looks like it took a while for the subscription to be marked inactive.] In this case, the whole chunk of text “but it is canceled now” was not translated into German. 4. and tone of voice — what did you call me? Languages have their own set of rules — that’s part of the reason why translation is so difficult. But when it comes to adapting and tone of voice in customer service, you need to be extra careful with how you address people. Here’s a simple example of the incorrect use of a pronoun in machine translation: Source (English): Make sure you have the latest operating system on your device Machine translation (German): Stellst du sicher, dass du das neueste Betriebssystem auf deinem Gerät hast [Make sure you [informal] have the latest operating system on your [informal] device] The customer usually defines the choice of . However, the use of the inappropriate (like the use of informal “du” instead of formal “Sie” in this example) can be a real threat when communicating with customers, who may see it as impolite. 5. Overtranslations — is that an off-brand? Some words are not supposed to be translated, like a company’s name or a person’s name. But machines don’t always know that. So, overtranslations such as this one are quite common: Source (English): I checked with the seller and as long as it not am Rapid Cheetah product, it is fine. Machine translation (German): Ich habe mit dem Verkäufer überprüft und solange es kein Schnellesgeparden produkt ist, ist es in Ordnung. [I checked with the seller and as long as it not a rapid cheetah product, it is fine.] Here, the brand’s name, “Rapid Cheetah,” is given as a literal translation in German. Sure, it’s funny, but it can also be confusing or even off-putting to customers. 6. Inconsistent or incorrect use of terminology — too many words One word may have different translations and you need to know exactly which one to use when communicating with your customers. And when things go wrong, it can look weird: Source (English): Packages 1 and 2 both charge a monthly fee, as these have additional features to Package 1. Machine translation (Dutch): Pakketten 1 en 2 vragen elk een maandelijks bedrag, omdat deze extra functies hebben voor Pakket 1. [Abonnements 1 and 2 both charge a monthly fee, as these have additional features to Abonnement 1.] In this example, the term “package” was required to be translated as “abonnement” and not “pakket“. I guess the MT system chose the wrong word. In short, pure machine translation systems lack the “human touch” required for understanding cultural references and contextual differences. Today, however, MT combined with advanced, automated quality assurance and post-editing by humans, ensures translations that are sound, and sound good — and often delivered within 20 minutes. This is a game-changer for customer service where it’s really not just a matter of quality but also speed. In a world where customers are not willing to wait more than 10 minutes to get their problems solved, attending to their needs in their native language on time is crucial. And here’s how machine translation can . Machine translation may not be at the end of its road, but it has come a long way toward meeting critical business needs. And this is just the beginning. Maxim Khalilov Maxim Khalilov Maxim Khalilov Director of Applied Artificial Intelligence Maxim Khalilov, Ph.D., is the director of Applied AI at Unbabel, where he leads a team of engineers to develop AI technologies that meet a wide array of business needs. In 2012, he co-founded NLPPeople.com, a natural language processing firm. Maxim enjoys rock climbing on the weekends. Customer Centric The weekly digest for the customer-obsessed Read more articles like this one Maria Almeida March 5, 2018・4 min read “Quality Estimation is everything that’s missing in Machine Translation.” Machine Translation Translation Quality Unbabel Mafalda Faria July 13, 2018・5 min read Improving customer support through self-service. Here’s how to get it right. Customer Service FAQs Language Julie Belião July 26, 2018・11 min read Hone your tone of voice. A linguistic perspective on how to talk to customers Customer Service Language Tone of Voice Contact Privacy Policy English + Português + Italiano + Français Building universal understanding Machine Translation Machine Translation Andovar Academy cover Back to Andovar Academy Machine Translation (MT) is a technology that automatically translates text using termbases and advanced grammatical, syntactic and semantic analysis techniques. The idea that computers can translate human languages is as old as computers themselves. The first attempts to build such technology in the 1950s in the USA were accompanied by a lot of enthusiasm and significant funding. However, the first decade of research failed to produce a usable system and the now-famous report by Automatic Language Processing Advisory Committee (ALPAC) in 1966 found that the ten-year-long effort failed to fulfill expectations. The next time the general public heard of MT was likely in the late 1990s when the internet portal AltaVista launched a free online translation service called Babelfish. Although the quality was often lacking, it became immensely popular and brought MT into the limelight again. Other internet giants presented similar services soon after, the most well-known of which is now Google Translate. Despite great strides in technology and addition of dozens of new language pairs, these free services are usable for “gist” or casual translation, but usually not for commercial purposes. On the other hand, commercial providers of MT technology have worked on improving their paid offerings and with customization such Machine Translation engines are finding commercial use in limited areas. However, challenges with understanding context, tone, language s and informal expression remain the reason why MT is not expected to replace human translators in the foreseeable future. The main use cases for machine translation are applications that require real-time or near real-time interaction, for assimilating texts and “chat”, and as a productivity tool supporting human translators. Machine translation is not to be confused with Computer-Aided Translation (CAT) Tools. What is MT Suitable for? The most common uses of MT technology are as follows: Gisting – The results of MT are generally not as good as translations produced by humans, but are useful for understanding roughly what a text says. Such translation may be good enough depending on the purpose and target audience. MT-human – In some cases, human translators edit machine translation results to produce final translations in what is called post-editing. Instant need – It can also be used for providing translations of materials that are time-sensitive and which cannot wait for the time required for human translation, such as results from database queries. Controlled language – For texts written in controlled language, customized MT engines can provide very high-quality translations, for example in translation of patents or technical specification sheets. High volume – Content producers are generating exponentially increasing volumes of material, and in many cases, human translation is simply not economically or technically feasible. Pseudotranslation – Localizers can use MT to translate source text to check for internationalization issues in the target languages before committing to professional translation. Support for human translators – Modern CAT tools allow users to translate source segments with MT. Translators can decide to use the results as they are or edit them manually, which can speed up their work. Types of Machine Translation Rule-Based Machine Translation (RBMT) RBMT, developed several decades ago, was the first practical approach to machine translation. It works by parsing a source sentence to identify words and analyze its structure, and then converting it into the target language based on a manually determined set of rules encoded by linguistic experts. The rules attempt to define correspondences between the structure of the source language and that of the target language. The advantage of RBMT is that a good engine can translate a wide range of texts without the need for large bilingual corpora, as in statistical machine translation. However, the development of an RBMT system is time-consuming and labor-intensive and may take several years for one language pair. Additionally, human-encoded rules are unable to cover all possible linguistic phenomena and conflicts between existing rules may lead to poor translation quality when facing real-life texts. For example, RBMT engines don’t deal well with slang or metaphorical texts. For this reason, rule-based translation has largely been replaced by statistical machine translation or hybrid systems, though it remains useful for less common language pairs where there are not enough corpora to train an SMT engine. Statistical Machine Translation (SMT) SMT works by training the translation engine with a very large volume of bilingual (source texts and their translations) and monolingual corpora. The system looks for statistical correlations between source texts and translations, both for entire segments and for shorter phrases within each segment, building a so-called translation model. It then generates confidence scores for how likely it is that a given source text will map to a translation. The translation engine itself has no notion of rules or grammar. SMT is the core of systems used by Google Translate and Bing Translator, and is the most common form of MT in use today. The key advantage of statistical machine translation is that it eliminates the need to handcraft a translation engine for each language pair and create linguistic rule sets, as is the case with RBMT. With a large enough collection of texts, you can train a generic translation engine for any language pair and even for a particular industry or domain of expertise. With large and suitable training corpora, SMT usually translates well enough for comprehension. The main disadvantage of statistical machine translation is that it requires very large and well-organized bilingual corpora for each language pair. SMT engines fail when presented with texts that are not similar to material in the training corpora. For example, a translation engine that was trained using technical texts will have a difficult time translating texts written in casual style. Therefore, it is important to train the engine with texts that are similar to the material that will be translated. Example-Based Machine Translation (EBMT) In an EBMT system, a sentence is translated by analogy. A number of existing translation pairs of source and target sentences are used as examples. When a new source sentence is to be translated, the examples are retrieved to find similar ones in the source, then the target sentence is generated by imitating the translation of the matched examples. Because the hit rate for long sentences is very low, usually the examples and the source sentence are broken down into small fragments. This approach may result in high-quality translation when highly similar examples are found. On the contrary, when there is no similar example found, the translation quality may be very low. EBMT has not been widely deployed as a commercial service. Neural Machine Translation Neural machine translation (NMT) is based on the paradigm of machine learning and is the newest approach to MT. NMT uses neural networks that consist of nodes conceptually modeled after the human brain. The nodes can hold single words, phrases, or longer segments and relate to each other in a web of complex relationships based on bilingual texts used to train the system. The complex and dynamic nature of such networks allows the formation of significantly more educated guesses about the context and therefore the meaning of any word to be translated. NMT systems continuously learn and adjust to provide best output and require a lot of processing power. This is why this approach has only become viable in recent years. Hybrid All the above have their shortcomings, and many hybrid MT approaches have been proposed. The two main categories of hybrid systems are: rule-based engines using statistical translation for post processing and cleanup, statistical systems guided by rule-based engines. either of the above with some input from neural machine translation system. In the first case, the text is translated first by a RBMT engine. This translation is then processed by an SMT engine, which corrects any errors made. In the second case, the RBMT engine does not translate the text but supports the SMT engine by inserting metadata (e.g. noun/verb/adjective, present/past tense, etc.) Almost all the practical MT systems adopt hybrid approaches to a certain extent, combining rule-based and statistical approaches. Most recently, more and more systems also take advantage of NMT to different degrees. Measuring Quality of MT Measuring and benchmarking MT quality remains a difficult challenge. While standardized quality scales exist, they only provide a comparative and not absolute measure of quality. This is important because what’s really needed is an automated way to identify problem texts so they can be routed for human review and post-edit. At present, the standard practice is to have human reviews look at a certain percentage of texts, or spend an assigned amount of time reviewing a subset of a project. The most reliable method of MT quality evaluation requires human evaluators to score each sentence, either within text translated by an MT engine or in comparison with others. The average score on all the sentences from all evaluators is the final score. The most common metrics for human scoring are adequacy and fluency of translation. Human evaluation is expensive and time-consuming and thus unsuitable for frequent use during research and development of MT engines. Various automatic evaluation methods are available to measure similarity of MT translation and that from a human translator. Some examples: Word error rate (WER) is defined based on the distance between the system output and the reference translation at the word level. Position-independent error rate (PER) calculates the word error rate by treating each sentence as a bag of words and ignoring the word order. Bilingual Evaluation Understudy (BLEU) computes the n-gram precision rather than word error rate. Metric for Evaluation of Translation with Explicit Ordering (METEOR) takes stemming and synonyms into consideration. Automatic translation quality evaluation plays an important role in MT research since it s measure quality between iterations of an engine and between different engines. However, the correlation between automatic and human evaluation metrics is not satisfactory. Related Posts Translator’s Style Guide Translator’s Style Guide Translator’s Style Guide October 9th, 2015| 0 Comments Computer-Aided Translation (CAT) Tools Computer-Aided Translation (CAT) Tools Computer-Aided Translation (CAT) Tools August 17th, 2015| 1 Comment Alignment of Translation Alignment of Translation Alignment of Translation July 21st, 2015| 0 Comments XLIFF XLIFF XLIFF July 14th, 2015| 0 Comments Translation Memory eXchange (TMX) Translation Memory eXchange (TMX) Translation Memory eXchange (TMX) June 25th, 2015| 0 Comments Corpus Corpus Corpus June 9th, 2015| 0 Comments Translation Memory Translation Memory Translation Memory June 3rd, 2015| 0 Comments Controlled Language Controlled Language Controlled Language May 29th, 2015| 0 Comments Terminology Management Terminology Management Terminology Management May 22nd, 2015| 0 Comments Term Base eXchange (TBX) Term Base eXchange (TBX) Term Base eXchange (TBX) May 12th, 2015| 0 Comments Translation Termbase Translation Termbase Translation Termbase April 24th, 2015| 0 Comments RESOURCE CENTER Blog White Papers Andovar Academy Languages in Focus ABOUT ANDOVAR Company Background Key Staff Careers Get in Touch OTHER SITES IFRAME: https://www.googletagmanager.com/ns.html?idGTM-5HSTCW8 ____________________ [Translations-544-3.gif] POSTED ON AUG 31, 2018 TO AI Research Unsupervised machine translation: A novel approach to provide fast, accurate translations for more languages Marc'Aurelio Ranzato Guillaume Lample Myle Ott Automatic language translation is important to as a way to allow the billions of people who use our services to connect and communicate in their preferred language. To do this well, current machine translation (MT) systems require access to a considerable volume of translated text (e.g., pairs of the same text in both English and Spanish). As a result, MT currently works well only for the small subset of languages for which a volume of translations is readily available. Training an MT model without access to any translation resources at training time (known as unsupervised translation) was the necessary next step. Research we are presenting at EMNLP 2018 outlines our recent accomplishments with that task. Our new approach provides a dramatic improvement over previous state-of-the-art unsupervised approaches and is equivalent to supervised approaches trained with nearly 100,000 reference translations. To give some idea of the level of advancement, an improvement of 1 BLEU point (a common metric for judging the accuracy of MT) is considered a remarkable achievement in this field; our methods showed an improvement of more than 10 BLEU points. This is an important finding for MT in general and especially for the majority of the 6,500 languages in the world for which the pool of available translation training resources is either nonexistent or so small that it cannot be used with existing systems. For low-resource languages, there is now a way to learn to translate between, say, Urdu and English by having access only to text in English and completely unrelated text in Urdu – without having any of the respective translations. This new method opens the door to faster, more accurate translations for many more languages. And it may only be the beginning of ways in which these principles can be applied to machine learning and artificial intelligence. Word-by-word translation The first step toward our ambitious goal was for the system to learn a bilingual dictionary, which associates a word with its plausible translations in the other language. For this, we used a method we introduced in a previous paper, in which the system first learns word embeddings (vectorial representations of words) for every word in each language. Word embeddings are trained to predict the words around a given word using context (e.g., the five words preceding and the five words following a given word). Despite their simplicity, word embeddings capture interesting semantic structure. For instance, the nearest neighbor of “kitty” is “cat,” and the embedding of the word “kitty” is much closer to the embedding of “animal” than it is to the embedding of the word “rocket” (as “rocket” seldom appears in the context of the word “kitty”). Moreover, embeddings of words in different languages share similar neighborhood structure, because people across the world share the same physical world; for instance, the relationship between the words “cat” and “furry” in English is similar to their corresponding translation in Spanish (“gato” and “peludo”), as the frequency of these words and their context are similar. Because of those similarities, we proposed having the system learn a rotation of the word embeddings in one language to match the word embeddings in the other language, using a combination of various new and old techniques, such as adversarial training. With that information, we can infer a fairly accurate bilingual dictionary without access to any translation and essentially perform word-by-word translation. Two-dimensional word embeddings in two languages (left) can be aligned via a simple rotation (right). After the rotation, word translation is performed via nearest neighbor search. Two-dimensional word embeddings in two languages (left) can be aligned via a simple rotation (right). After the rotation, word translation is performed via nearest neighbor search. Translating sentences Word-by-word translation using a bilingual dictionary inferred in an unsupervised way is not a great translation — words may be missing, out of order, or just plain wrong. However, it preserves most of the meaning. We can improve upon this by making local edits using a language model that has been trained on lots of monolingual data to score sequences of words in such a way that fluent sentences score higher than ungrammatical or poorly constructed sentences. So, if we have a large monolingual data set in Urdu, we can train a language model in Urdu alongside the language model we have for English. Equipped with a language model and the word-by-word initialization, we can now build an early version of a translation system. Although it’s not very good yet, this early system is already better than word-by-word translation (thanks to the language model), and it can be used to translate lots of sentences from the source language (Urdu) to the target language (English). Next, we treat these system translations (original sentence in Urdu, translation in English) as ground truth data to train an MT system in the opposite direction, from English to Urdu. Admittedly, the input English sentences will be somewhat corrupt because of translation errors of the first system. This technique was introduced by R. Sennrichal. at ACL 2015 in the context of semisupervised learning of MT systems (for which a good number of parallel sentences are available), and it was dubbed back translation. This is the first time this technique has been applied to a fully unsupervised system; typically, it is initially trained on supervised data. Now that we have an Urdu language model that will prefer the more fluent sentences, we can combine the artificially generated parallel sentences from our back translation with the corrections provided by the Urdu language model to train a translation system from English to Urdu. Once the system has been trained, we can use it to translate many sentences in English to Urdu, forming another data set of the kind (original sentence in English, translation in Urdu) that can improve the previous Urdu-to-English MT system. As one system gets better, we can use it to produce training data for the system in the opposite direction in an iterative manner, and for as many iterations as desired. Top: a sentence in English is translated to Urdu using the current En-Ur MT system. Next, the Ur-En MT system takes that Urdu translation as input and produces the English translation. The error between “cats are crazy” and “cats are lazy” is used to change the parameters such that the Ur-En MT system is more likely to output the correct sentence at the next iteration. Bottom: The same process in reverse, using the Ur-En MT system to provide data for the En-Ur MT system. The best of both worlds In our research, we identified three steps — word-by-word initialization, language modeling, and back translation — as important principles for unsupervised MT. Equipped with these principles, we can derive various models. We applied them to two very different methods to tackle our goal of unsupervised MT. The first one was an unsupervised neural model that was more fluent than word-by-word translations but did not produce translations of the quality we wanted. They were, however, good enough to be used as back-translation sentences. With back translation, this method performed about as well as a supervised model with 100,000 parallel sentences. Next, we applied the principles to another model based on classical count-based statistical methods, dubbed phrase-based MT. These models tend to perform better on low-resource language pairs, which made it particularly interesting, but this is the first time this method has been applied to unsupervised MT. In this case, we found that the translations had the correct words but were less fluent. Again, this method outperformed previous state-of-the-art unsupervised models. Finally, we combined both models to get the best of both worlds: a model that is both fluent and good at translating. To do this, we started from a trained neural model and then trained it with additional back-translated sentences from the phrase-based model. Empirically, we found that this last combined approach dramatically improved accuracy over the previous state-of-the-art unsupervised MT — showing an improvement of more than 10 BLEU points on English-French and English-German, two language pairs that have been used as a test bed (and even for these language pairs, there is no use of any parallel data at training time — only at test time, to evaluate). We also tested our methods on distant language pairs like English-Russian; on low-resource languages like English-Romanian; and on an extremely low-resource and distant language pair, English-Urdu. In all cases, our method greatly improved over other unsupervised approaches, and sometimes even over supervised approaches that use parallel data from other domains or from other languages. German-to-English translation examples show the results of each method: German-to-English translation examples show the results of each machine translation method German-to-English translation examples show the results of each machine translation method Beyond MT Achieving an increase of more than 10 BLEU points is an exciting start, but even more exciting for us are the possibilities this opens for future improvements. In the short term, this will certainly us translate in many more languages and improve translation quality for low-resource languages. But the learnings gained from this new method and the underlying principles could go well beyond MT. We see potential for this research to be applied to unsupervised learning in any arena and potentially allowing agents to leverage unlabeled data and perform tasks with very few, if any, of the expert demonstrations (translations, in this case) that are currently required. This work shows that it is at least possible for the system to learn without supervision and to build a coupled system in which each component improves over time in a sort of virtuous circle. Related IFRAME: https://www.facebook.com/plugins/like.php?hrefhttps://code.fb.com/ai-r esearch/unsupervised-machine-translation-a-novel-approach-to-provide-fa st-accurate-translations-for-more-languages/&width450&layoutstandard& actionlike&sizesmall&show_facesfalse&sharetrue&height35&appId2502 08035192 Traduction linguistique précisenaturelle Démarrez avec Amazon Translate Amazon Translate estservice de traduction automatique neuronale offranttraductions linguistiques rapides, abordables et d'excellente qualité. La traduction automatique neuronaleune méthode de traduction automatique qui exploitemodèles de deep learning pour générer une traduction plus fluideplus naturelle que algorithmes de traduction traditionnels, basés surstatistiques règles. Amazon Translate vous permet de localiser du contenu (sites Webapplications) pourutilisateurs internationauxde traduire facilement de gros volumes de texte. Sommet AWS San Francisco 2018 – Amazon Translatemaintenant disponible pour tous Avantages Extrême précisionapprentissage continu Intégration facilevos applications Personnalisable Scalable Amazon Translate estservice de traduction automatique.moteurs de traduction apprennent en permanence grâce àensembles de données nouveauxétendus afin de produiretraductions plus précises pour large éventail de cas d'utilisation. Amazon Translate simplifie la création de capacités de traduction en temps réelpar lot dans vos applicationsl'aide d'un simple appel d'API. Cela vous permet de localiser facilement une application ou un site Web, ou de traiterdonnées multilingues au sein de vos flux de travail existants. Amazon Translate vous permet de définir comment vos noms de marque, noms de personnage, noms de modèleautres termes uniques sont traduitsl'aide de la fonctionnalité terminologie personnalisée. La possibilité de personnaliserrésultats avec la terminologie personnalisée peut réduire le nombre de traductionsmodifier par des traducteurs professionnels, ce qui entraîneéconomies de coûts et traductions plus rapides. Qu'il s'agisse de quelques mots ou de grands volumes de texte, Amazon Translate s'adapte facilementvos besoins en matière de traduction. Le service assuretraductions rapidesfiables, quel que soit le volumedemandes de traduction que vous lui soumettez. Cas d'utilisation de la traduction automatique Analyse du sentiment multilingue du contenumédias sociaux Fourniture de traductionsla volée de contenus générés parutilisateurs Ajout d'un service de traduction en temps réel pourapplications de communication Avec Amazon Translate, vous n'êtes pas limité par la barrière de la langue. Comprenez le sentiment social de votre marque, produit ou service tout en surveillantconversations en ligne dans différentes langues. Traduisez simplement le texte en anglais avant d'utiliser une application de traitement du langage naturel (NLP) telle qu'Amazon Comprehend pour analyser le contenu textuel dans une multitude de langues. Iltrès difficile pouréquipes de traduction humaines de suivre l'évolution du contenu dynamique ou en temps réel. Avec Amazon Translate, vous pouvez facilement traduire d'importants volumes de contenus générés parutilisateurs en temps réel.sites Web et applications peuvent automatiquement créercontenu, pour alimenterarticles,descriptions de profilsdes commentaires, dans la langue de l'utilisateur, en cliquant sur le bouton « Traduire ». Amazon Translate peut fournir une traduction automatique pour permettre communications interlinguistiques entre utilisateurs pour vos applications. En ajoutant une fonction de traduction en temps réel à applications de discussion, de messagerie, d'assistance technique de tickets,agent ouemployé anglophone peut communiquer avec clients dans plusieurs langues. Clients Amazon Translate DigitalGlobe « Chez Hotels.com, nous nous engageonsoffrirnos clients les informationsplus pertinentesles plus récentes sur leur destination. Pour ce faire, nous gérons 90 sites Web localisés dans 41 langues. Nous comptons plus de 25 millions d'avis de clients dont le nombre ne cesse d'augmenter chaque jour, ce qui fait de nos sites candidats idéaux pour la traduction automatique. Nous avons évalué Amazon Translateplusieurs autres solutions,d'après nous, Amazon Translate estservice rapide, efficacesurtout précis. Nous souhaitons profiterdernières avancées en matière d'apprentissage automatiquede la transition versmoteurs neuronaux pour personnaliserlocaliser davantageavis, et améliorer globalement l'expérience de nos clients. Amazon Translate estpas en avant dans cette direction. » Matthew Fryer – vice-présidentresponsable scientifiquedonnées, Hotels.com Thomson Reuters «entreprises numériques d'aujourd'hui subissentpressions pour produire toujours plus de contenu, plus rapidementavec plus de pertinence.traducteurs humains armés de la traduction automatique aidententrepriseslocaliser plus de contenu, plus rapidement,moindre coûtdans plus de langues. D'après notre expérience, en associant Amazon Translate àéditeur humain, nous pensons pouvoir générer jusqu'à 20% d'économies. » Ken Watson CTO chez Lionbridge ZipRecruiter « En utilisant nos servicesnotre technologie,entreprises mondiales peuvent localiser rapidementquantités massives de contenu tout en conservant une haute qualité. Nous sommes ravis des premiers résultats que nous avons obtenus avec Amazon Translate sur projet de traduction que nous avons lancé pour notre client, iHerb. Le délai d'exécution global a été réduit de 67 %, tout en maintenantmêmes normes de qualité élevées. Nos coûts totaux ont été réduits en proportion, ce qui nous permet d'offrirnos clients prix encore plus compétitifs. » Ofer Shoshan PDG de One Hour Translation Intuit « Chez Isentia, nous avons conçu notre logiciel de renseignements médiatiques dans une seule langue. Pour élargir nos capacités et répondre aux divers besoins linguistiques de nos clients, nous avions besoin d'une aidela traduction pour générertransmettre des informations précieusespartir de contenus médiatiques publiés dans une langue autre que l'anglais. Après avoir essayé de nombreux services de traduction automatique, nous avons été impressionnés par la facilité avec laquelle Amazon Translate s'intègre dans notre pipelinepar sa capacités'adapter, quel que soit le volume généré.traductions sont également plus précisesplus nuancées,respectentnormes élevées pour nos clients. » Andrea Walsh Directeursystèmes d'information chez Isentia En savoir plus surfonctions d'Amazon Translate Consultez la pagefonctions Prêtconcevoir ? Démarrez avec Amazon Translate D'autres questions ? Contactez-nous Créercompte gratuit Podcast Twitch Blog AWS Flux d’actualité RSS Misesjour par e-mail Locations mentioned in global news coverage monitored by GDELT 2015-2018, colored by the primary language of coverage mentioning each locationKalev Leetaru Imagine a world without language barriers, where anyone can access real-time information from anywhere in the world in any language, seamlessly translated into their native tongue and where their own writings are equally accessible to speakers of all the world’s languages. Such has been the dream of science fiction writers since time immemorial, in which mass machine translation eliminates barriers to information access and communication and creates a post-lingual society. Yet, even as the digital world increasingly eliminates geographic barriers and makes it possible to hear from an ever-greater portion of the world’s citizenry, language barriers mean much of the world’s information remains inaccessible. The most basic approach to searching across languages is simply to translate keyword searches from one language to another, either through a preexisting translation reference or through machine translation. Unfortunately, the differences between languages can mean that a word in one language can translate into dozens of equivalents in another, turning a simple one-word search into a massive Boolean query. Traditional machine translation systems are not typically able to readily provide the complete list of every possible translation of a word from one language to another. For example, type “New York” into Google Translate and you’ll get New York back, while Bing Translate will offer New Yorgis. In reality, New York can be translated fourteen different ways into Estonian: “New York”, “New Yorki” , “New Yorgi”, “New Yorgisse”, “New Yorgis”, “New Yorgist”, “New Yorgile”, “New Yorgil”, “New Yorgilt”, “New Yorgiks”, “New Yorgini”, “New Yorgina”, “New Yorgita” and “New Yorgiga.” This means robustly searching for a given word or phrase in another language will often require the assistance of a person with native fluency in order to craft the appropriate queries. Moreover, if the goal is to offer more than basic keyword searches, then any natural language processing algorithms will need to be designed to handle every single language of interest. Unfortunately, the dearth of training data for all but a handful of languages means that few algorithms or tools are available for most of the world’s languages. The result is that despite having digital access to an almost unimaginable wealth of knowledge from across the planet, we rarely see the world beyond that captured in our own language. This can have devasting consequences, from missing the eariest warning signs of epidemics to narrowing our understanding of terrorism. When we read about conflicts or narratives in countries that speak languages other than our own, we see those stories only through the lens of those like us. We’re never able to actually see the world through the eyes of others. In contrast, what might it look like to invert this process? Imagine using massive machine translation to live translate an ever-growing fraction of worldwide news coverage in realtime. Within seconds of a news article being published somewhere on earth, it has been machine translated into an intermediate semantic structure that captures its meaning in a language-agnostic form, with a live-updating language model used to generate translations of the article into any language of interest. Now, keyword searches in a given language can be used to directly search the machine translations of worldwide coverage into that language, ensuring that a search in English for “New York” will return any Estonian language article using any of the 14 forms above, which the machine translation process will have converted to “New York.” Similarly, natural language processing algorithms can operate in their existing languages by simply processing the translated results in the language they are designed for. Thus, any algorithm available for English language content can be applied directly to the English machine translations of coverage from any other language, making all the world’s algorithms available for any language. Such was the goal of my open data GDELT Project’s Translingual initiative that launched almost four years ago. Unlike traditional machine translation efforts that merely translate single documents on demand, the goal of Translingual is to translate the world’s news coverage in realtime, seconds after it is published, from 65 languages (soon to be over 100) representing up to 98.4% of non-English online news coverage. Every article is translated into English using an iterative contextual clarification process akin to true translation, rather than the mere “interpretation” that we associate with machine “translation” today. Natural language processing algorithms natively designed for each language are run on the original content as-is, but the English translations allow GDELT to uniformly run the same algorithms across every news article, regardless of its original source language, essentially bridging the linguistic divide when it comes to automated text mining. To understand the critical importance of machine translation in understanding the world around us, the map below shows every distinct location that GDELT identified a mention of in the more than 7.1 billion geographic references across 850 million worldwide news articles it monitored 2015-present. Locations mentioned in global news coverage monitored by GDELT 2015-2018Kalev Leetaru The map below colors each of those points by the most common language of news coverage mentioning it (via the 65 languages GDELT currently translates from). While news coverage in languages across the world likely mention Paris, France at least once in the course of a year, the city is most commonly mentioned in French language news coverage, reflecting the geographic locality of journalism. Locations mentioned in global news coverage monitored by GDELT 2015-2018, colored by the primary language of coverage mentioning each locationKalev Leetaru Perhaps most readily apparent in this map is how little of the world’s surface is covered primarily by the English language press. In other words, to truly understand the local stories and narratives across the world, you must look beyond English to local sources in local languages. The rich colorful vibrancy of this map reminds us of just how diverse our shared world is and how much information we miss by focusing only on the language(s) we ourselves speak. Leveraging this model, companies are increasingly combining this mass machine translation approach with selective machine translation to expand their reach into local events and narratives. In the end, putting this all together, we live in an era where machine translation, while far from perfect, is both scalable and accurate enough to allow us to machine translate the world’s news coverage in realtime, enabling language agnostic searching and data mining. As machine translation continues to improve at an exponential rate, we are increasingly able to see the world through the eyes of others. Based in Washington, DC, I founded my first internet startup the year after the Mosaic web browser debuted, while still in eighth grade, and have spent the last 20 years working to reimagine how we use data to understand the world around us at scales and in ways never before... MORE Print Site Feedback Tips Corrections Reprints & Permissions Terms Privacy 2019 Forbes Media LLC. All Rights Reserved. AdChoices Neural Machine Translation (NMT) /Neural Machine Translation (NMT) Traduction automatique Neuronale – Neural Machine Translation (NMT) : l’intelligence artificielle au service de la traduction Qu'est ce que la traduction automatique neuronale ? Ubiqus traduction automatique IA La traduction automatique neuronale ou Neural Machine Translation (NMT) une technologie basée surréseaux de neurones artificiels. Elle a faitprogrès considérables ces dernières années grâce à l’intelligence artificiellepeut désormais servir de base pour certaines traductions professionnelles. La traduction automatique neuronale permet de traduire, en temps réel, millions d’informations avec une précisionune fiabilité désormais proche de celles d’un être humain. Si, dans notre quotidien, nous sommes déjà familierslogiciels de traduction automatique comme Google Translate, l’Intelligence Artificielle vient bouleverser la donne. La machine, comme le cerveau humain,en effet désormais capable de restituer une traduction fiable mais aussi d’apprendre une langue, et donc d’améliorer constamment la qualitééléments traduits.Pour accroîtreperformances de la « machine », cette dernière est entraînée par l’homme. Concrètement, cela revientalimenter la machine avectrès grand volume de données de qualité (mots, segments de phrasestextes déjà traduits) afin d’améliorer la fiabilitéla finesserésultats. Une « machine » peut également être entraînée pour répondre aux besoins spécifiques d’un secteur (traduction juridique, traduction médicale etc.) ou pour le métier d’un client, avecvocabulaire métier propre. NMT - Fonctionnement - Open NMT Communauté Vous disposez de données dans une langue étrangèrevous souhaitez obtenir rapidement leur traduction ? > consultez-nous pour réaliseraudit de vos données L'alliance de la technologiede la traduction humaine chez Ubiqus Si, pendant longtemps, la traduction humainela traduction automatique ont été opposées, il convient désormais deassocier. Pour garantirtraductions de qualité, ilnécessaire d’adapter de faire relire le contenu traduit automatiquement partraducteur professionnel. Cette étape de vérification, d’adaptationde correction s’appelle la post-édition. Elle viserendre le contenu final intelligiblefluide. Ubiqus NMT alliance technologiehumain En savoir plus sur la traduction automatique Notre offre de traduction automatique Notre agence de traduction (BUTTON) Toggle navigation Machine translation Overview (current) Coursework Syllabus Machine translation [artsrouni.jpg] Georges Artsrouni's mechanical brain, a translation device patented in 1933 in France. __________________________________________________________________ Overview (current) Coursework Syllabus This page is for the 2018 offering of this course, and is here for archival purposes. This course will no longer be offered. Instead, it will be merged with Natural language understanding and Natural language generation into a new 20-point second-semester NLP course: Natural language understanding, generation, and machine translation. Course Description Google translate instantly translates between any pair of over eighty human languages like French and English. How does it do that? Why does it make the errors that it does? And how can you build something better? Modern translation systems like Google Translate, learn to translate by reading millions of words of already translated text. This course will show you how they work. We cover fundamental building blocks from machine learning, computer science, and linguistics, showing how they apply to a real and difficult problem in artificial intelligence. Time and Place Mondays 16:10 to 17:00, Medical School, Room 425 Anatomy Lecture Theatre - Doorway 3 Thursdays 16:10 to 17:00, Medical School, Room 425 Anatomy Lecture Theatre - Doorway 3 Teaching Team Rico Sennrich (Office hours: 3:00 Mondays, Absorb Cafe, starting week 3) Alham Aji Jonathan Mallinson Ida Szubert Denis Emelin Ask us questions on piazza. But answer questions too. Textbook There is no required textbook. The course will draw on recent literature from this fast-moving field. However, some background will be drawn from the following books. Neural Machine Translation by Philipp Koehn. Available online. Deep Learning by Goodfellow, Bengio, and Courville. Available online. Linguistic Fundamentals for Natural Language Processing by Emily Bender. Available electronically from the university library. Assessment The assessment will consist of: A practical course work assignment, due in week 8 (30%). You are encouraged to work in pairs. a final exam in the April/ May diet (70%): April 30th, 14:30-16:30, Appleton Tower Concourse. The course will follow the school-wide late coursework policy and academic conduct policy. Past exam papers are available here. Prerequisites The course assumes you have taken ANLP or equivalent. Machine translation applies concepts from computer science, statistics, and linguistics. You needn’t be an expert in all three of these fields (few people are), but if you are allergic to any of them you should not take this course. Concretely, you will be expected to already understand the following topics before taking the course, or be prepared to learn them independently. Discrete mathematics: analysis of algorithms, dynamic programming, basic graph algorithms. Other essential maths: basic probability theory; basic calculus and linear algebra; ability to read and manipulate mathematical notation including sums, products, log, and exp. Programming: ability to read and modify python programs; ability to design and implement a function based on high-level description such as pseudocode or a precise mathematical statement of what the function computes. Linguistics: basic elements of linguistic description. Course catalogue University: INFR11062 Informatics: MT __________________________________________________________________ Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB, Scotland, UK Tel: +44 131 651 5661, Fax: +44 131 651 1426, E-mail: school-office@inf.ed.ac.uk Please contact our webadmin with any comments or corrections. Logging and Cookies Unless explicitly stated otherwise, all material is copyright The University of Edinburgh. Material on this page is freely reuasable under a Creative Commons attribution license, and you are free to reuse it with appropriate credit. The website is based on source code by Adam Lopez, available on github. vCard publisher Pangeanic.com » Feed Pangeanic.com » Comments Feed alternate alternate Quantcast Machine Translation Technology Home › Translation Technology › Machine Translation Technology Pangeanic was the first translation company in the world to make commercial use of the statistical machine translation system Moses as reported at the Association for Machine Translation in the Americas (AMTA) in 2010 and the European Union project Euromatrixplus. Nowadays, Pangeanic’s neural machine translation engines are first-of-class and have been chosen by US government agencies and the European Union and Member States, as well as many translation companies. Dozens of corporations, businesses and language service providers, have benefited by a flexible approach that is user-centric and provides the highest levels of control, customization and ownership to the users. Pangeanic has developed and used machine translation for many applications. It has reported successful use cases for many of its clients at industry events like Localization World Barcelona 2011, Localization World Paris 2012, Localization World London 2013, as well as numerous TAUS summits in the United States, Europe and in Japan, META Forum Berlin 2013 and Japan Translation Federation. Pangeanic's Syntax-Based Hybrid Machine Translation Pangeanic’s Syntax-Based Hybrid Machine Translation Pangeanic was also one of the largest donors of training data to TAUS, which in turn provided access to millions of words as training corpus. This enhanced PangeaMT platform and provided our team with the opportunity to experiment further, with millions and millions of aligned sentences. Machine translation became part of company culture since 2009. Since then, machine translation services to corporations and even other translation companies have become part of Pangeanic’s range of services. From 2012 to 2016, Pangeanic has been a member of the EU’s Marie Curie action EXPERT Project, advancing the state-of-the-art with young and experienced researchers. PangeaMT is Pangeanic’s own, independent translation technology division with a clear focus on customized, domain-specific Machine Translation (MT). The current version of the platform is v3. EuroMatrixplus EXPERT project translation technologies and machine translation HISTORY As a forward-thinking and technology-savvy translation company, Pangeanic wins a post-editing contract in 2007 to work for the European Commission as MT output post-editors. It is at this time when we become acquainted with institutional user needs and (re-)evaluated several commercial MT products we had been using. Soon we decided to develop our own machine translation technology. Pangeanic was quoted as the first language service provider to make commercial use of Moses in EU’s Framework development program euromatrixplus.net (the second, more perfected release of Moses). Since then, many presentations, awards and implementations have followed, and Pangeanic has made a name for itself as a leading machine translation implementation company. It also markets its machine translation services in other areas beyond the translation industry and is heavily involved in two more EU machine translation R&D programs, EXPERT and Casmacat (User Group). Pangeanic obtained the biggest contract for machine translation infrastructures for the European commission (2017) with its iADAATPA project. Neural machine translation technology has been integrated in Pangeanic’s workflow to benefit its clients with faster translation turnarounds. Neural networks-based engines also serve EU projects, US government agencies and international companies on the cloud and on-premise. FOCUS We began as keen followers of the statistical-driven paradigm of machine translation. This worked very well in several related languages (Romance languages and English, German and Scandinavian languages). However, our links to Japanese industry soon provided requests to add Japanese and Chinese to our service portfolio. In 2011, Pangeanic developed hybrid machine translation services which were included as part of the system features. FEATURES Despite our Moses bias, we have been able to overcome many of Moses shortcomings in order to fit the needs of the translation industry: our solutions go beyond text-based MT and are capable of taking input and producing output in industry-standards, such as TMX and XLIFF. PangeaMT provides API access to other translation platforms so you do not need to change your translation environment but you can benefit from adding your future translations in a virtuous re-training cycle. Using open standards means that you will never have to buy expensive TM software again. Our solutions just avoid having you locked-in by expensive upgrades year after year. Another PangeaMT breakthrough is our inline mark-up parser. PangeaMT handles tags extremely efficiently. Statistical machine translation systems (as they come from open sources releases) usually produce plain text output because this is also the format they process. However, we are keen to see PangeaMT solutions in use and adapted to the most demanding language industry requirements. We focused our effort on developing SMT engines capable of handling in-line coding typical of other content formats used in localization production environments. Thanks to this parser, PangeaMT can identify in-lines without attempting to translate them, and it places them back in the resulting text, too. An in-line placeholder acts first by copying and transferring all XML and code information to a separate module. The translation engine does its work and then places the in-line back into the translated segment. At the time of its release, our in-line parser constituted an innovation well-above the current level of maturity of well-known SMT systems. We keep learning and improving with every development commissioned by an existing or new client and language combination. We therefore remain open as to apply new hybridization techniques, even ad-hoc rules, that we research and implement ourselves or co-develop in conjunction with our clients. We are aware of the fact that for some language combinations it will be necessary to resort to some linguistic-informative techniques that will be part of the pre- or post-processing phases. Right word and phrase reordering in the MT output is not an easy goal to achieve, especially when the languages involved are not closely linked from a linguistic family standpoint, or when one of the two languages is a really flexible and so MT-challenging word order (WO). Some language-specific fixing procedures may come handy. In some other cases, it may be useful to use one language as pivot to train engines in languages that are not close. These and other techniques may be used or taken as a basis for expanding our PangeaMT solution palette. Please visit our machine translation division website to learn more about PangeaMT. Neural Machine Translation It is a general agreement that Neural Machine Translation (NMT) has surpassed Statistical Machine Translation (SMT) in terms of fluency and adequacy when humans read the texts produced by the software. NMT uses a large artificial neural network that resembles what happens in the human brain with thousands of connections. One of the main advantages of NMT is that the context of the translation is much longer than SMT (phrase-level translation). Currently, developers mostly use sequence-to-sequence approaches where the full context of the sentence is taken into account. Accuracy and fluency of the translations increase with the use of NMT. Other advantages of NMT in respect to SMT are that NMT only requires a fraction of the memory needed by SMT and all parts of the NMT models are trained jointly (end-to-end approach) in order to maximize the target translation performance. Pangeanic is at the forefront of research and development of translation technologies incorporating NMT, embedding it in different processes. ActivaTM: Matrix, Scalable, Infinite Language Database Translation Memory Technology Translate easy – Machine Translation API Machine Translation Technology Cor: Integrate Translation Technology in Your Processes alternate alternate alternate LogiTerm RSS Feed Statistical and neural network automatic translation software PDF version Portage s translators boost productivity and improve the quality of their work by generating automatic translations that draw on their own documents. Using statistical machine learning technology, Portage creates ever-more accurate translations the more it is used. Because Portage uses your archives rather than external resources, the translations it generates are considerably more accurate than with other automatic translation systems. For each sentence translated by Portage, a confidence index is produced. This allows users to filter translated output based on quality. To obtain sufficient quality, we recommend training Portage on a corpus of at least 5 million words. Systems Comparison Many clients report having translated entire documents with an accuracy rate of 70-80% in certain subject areas. This kind of accuracy means that a language professional can realistically expect to perform revision, rather than translation, when equipped with Portage. Portage is integrated into LogiTerm, Terminotix’s computer-aided translation software. You can use LogiTerm’s pretranslation settings to activate Portage when working with LogiTerm’s pretranslation engine. If a match is not found for a given text segment, LogiTerm will display an automatic translation for the segment (if a specific correspondence threshold is met). Portage supports TMX and SDLXLIFF file formats and retains output formatting codes. It also features a SOAP interface, which allows for integration with any computer-aided translation software or any other platform. Portage software can be installed on your servers or hosted on Terminotix’s servers. Cornell University We gratefully acknowledge support from the Simons Foundation and member institutions. Title:Generative Neural Machine Translation Authors:Harshil Shah, David Barber (ted on 13 Jun 2018) Abstract: We introduce Generative Neural Machine Translation (GNMT), a latent variable architecture which is designed to model the semantics of the source and target sentences. We modify an encoder-decoder translation model by adding a latent variable as a language agnostic representation which is encouraged to learn the meaning of the sentence. GNMT achieves competitive BLEU scores on pure translation tasks, and is superior when there are missing words in the source sentence. We augment the model to facilitate multilingual translation and semi-supervised learning without adding parameters. This framework significantly reduces overfitting when there is limited paired data available, and is effective for translating between pairs of languages not seen during training. Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML) Cite as: arXiv:1806.05138 [cs.CL] (or arXiv:1806.05138v1 [cs.CL] for this version) Submission history From: Harshil Shah [view email] [v1] Wed, 13 Jun 2018 16:35:32 UTC (101 KB) Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?) Link back to: arXiv, form interface, contact. Browse v0.1 released 2018-10-22 Feedback? If you have a disability and are having trouble accessing information on this website or need materials in an alternate format, contact web-accessibility@cornell.edu for assistance. Is Neural Machine Translation Ready for Marketing Content? Terena Bell By Terena Bell | Sep 28, 2018 CHANNEL: Digital Marketing An electronic brain is learning from a book PHOTO: Shutterstock When Google Translate first hit the market, it wasn’t very good. Music fans were the first to prove this by making a laughingstock of the app by loading lyrics from songs like Will Smith’s “Fresh Prince of Bel-Air” and the theme song from Moana to see what funny or ridiculous translations Google would generate. While the tool isn’t nearly as bad as videos make it out to be, this negative PR has kept companies from using it. After all, if Google can’t translate song lyrics correctly, why would you trust it with marketing content? But Google Translate doesn’t represent all machine translation. However, it is a brand that happens to be well-known and free. And, in translation — just like everything else — you get what you pay for. Related Article: Tips for Building a Content Translation Strategy Machine Translation Post-Editing, MT+PE Professional (not free) machine translation s companies translate more content more quickly and at a lower price point. And when a human reviewer checks the work, quality is found to be just as high as that of traditional translation. The language industry calls this pairing: machine translation post-editing, or MT+PE for short. Traditionally, when you buy translation, your content goes through two rounds — the first person converts it into the new language then the second checks for error. With MT+PE, the computer takes the first pass, speeding up the entire process. This is what makes machine translation okay for certain marketing material, but not anything too highly-nuanced, Rick Antezana with the Association of Language Companies cautions. “Paid machine translation should be used when translating content of high volume and low risk/low importance,” he says. It all depends on the training. Words are stored in what computer programmers call an engine. Machine translation engines must be specifically trained for the old and new languages, as well as the subject matter you need. As with all artificial intelligence or machine learning, machine translation needs data to improve. If your company’s tech team uses the engine to translate user questions, it’ll be good at support language but not marketing. And just because an engine’s trained in the language you need doesn’t mean it can translate in the right direction: Spanish into English has traditionally required a different engine than English into Spanish. Enter Neural Machine Translation But that’s changing. Laura Brandon, former director of the Globalization and Localization Association, says, “The big development these days is neural machine translation, which is blowing other machine translation out of the water.” Using neural network technology — a type of machine learning designed to mimic neurons in the human brain, neural machine translation can train in multiple languages and directions at once. Separate engines are no longer needed, saving your company precious training time. So is neural network technology better at translating marketing content? Not necessarily. “When representing any kind of brand — whether it be an enterprise-level, global company or a small software company — using machine translation to represent any kind of content, including marketing content like social is a big gamble, as the software is incredibly well developed, but never perfect,” Antezana says. “The fewer eyeballs on potential content for translation and the higher the volume, the more appropriate it would be.” Comments 0 Comments Top Articles Related Articles Recent Comments 24 Headless CMS That Should Be On Your Radar in 2019 10 Trends That Will Shape the Digital Workplace in 2019 How Financial Services Compete on Customer Experience Digital Workplace Challenges for 2019 Why Did IBM Sell Lotus and Other Software Products to HCL? 3 Modern Marketing Ethics Traps to Avoid The Marketing Technologist: A Superhero and Agent of Change How Leaders Turn Martech Strategies into Tactics Establishing AI Ethics in Marketing Google Introduces 4 New Search Ad Position Metrics Useful Content? Email This Email This Stumble This Stumble This Follow @CMSWire Tags cxm, digital marketing, dxm, machine learning, machine translation, marketing, terena bell Please enable JavaScript to view the comments powered by Disqus. Resources Digital Workplace Conference Customer Experience Conference How to Select a Web CMS Sitecore Consultant What is the Digital Workplace? View All Events Add Your Event Events RSS Featured Events Jan 13 NRF New York City 2019 Jan 15 Customer Contact Week Nashville 2019 Jan 16 [CMSWire Webinar] How IT Leaders and Teams Can Win Big in 2019 Jan 21 OPEX Week: Business Transformation World Summit Orlando 2019 Jan 22 [CMSWire Webinar] Modern Intranets & Employee Experience: Trends, Challenges & Solutions Jan 23 Carnegie Dartlet Conference Orlando 2019 Jan 30 Savage.Z Berlin 2019 About Us SMG/CMSWire is a leading, native digital publication produced by Simpler Media Group, Inc. We provide articles, research and events for sophisticated professionals driving digital customer experience strategy, evolving the digital workplace and creating intelligent information management practices. The CMSWire team produces 400+ authoritative articles per quarter for our 2.7 million community members. Join us as a subscriber. Read more about us or learn how to advertise here. We also have a Reader Advisory Board. More: Monthly Editorial Calendar Article Submission Guidelines DW Experience Conference DX Summit Conference Advertiser Media Kit Press Releases Stay In the Loop Get Our ____________________ [BUTTON Input] (not implemented)_________ Most Popular Articles 24 Headless CMS That Should Be On Your Radar in 2019 view comments 10 Trends That Will Shape the Digital Workplace in 2019 view comments How Financial Services Compete on Customer Experience view comments Digital Workplace Challenges for 2019 view comments Why Did IBM Sell Lotus and Other Software Products to HCL? view comments Recent Comments 2019 Simpler Media Group, Inc. All rights reserved. Privacy Policy. Terms of Use. Powered by Sitecore and Coveo. SMGP v2.1.6928.18046. alternate Warning: The NCBI web site requires JavaScript to function. more... NCBI NCBI Logo Skip to main content Skip to navigation Resources How To About NCBI Accesskeys My NCBISign in to NCBISign Out PMC US National Library of Medicine National Institutes of Health Search database[PMC.....................] Search term ____________________ Search Advanced Journal list Journal List JMIR Public Health Surveill v.1(2); Jul-Dec 2015 PMC4869219 Logo of jmirphs JMIR Public Health Surveill. 2015 Jul-Dec; 1(2): e17. Published online 2015 Nov 17. doi: 10.2196/publichealth.4779 PMCID: PMC4869219 PMID: 27227135 Machine Translation of Public Health Materials From English to Chinese: A Feasibility Study Monitoring Editor: Gunther Eysenbach Reviewed by Daniel Capurro, Yoonsang Kim, and Barabara Massoudi Anne M Turner, MD, MLIS, MPH,^ corresponding author ^^1 Kristin N Dew, MS,^^2 Loma Desai, MS, MBA,^3 Nathalie Martin, BA,^^4 and Katrin Kirchhoff, PhD^^5 ^1Northwest Center for Public Health Practice, Department of Health Services, University of Washington, Seattle, WA, United States ^2Northwest Center for Public Health Practice, Human Centered Design & Engineering, University of Washington, Seattle, WA, United States ^3Northwest Center for Public Health Practice, Information School, University of Washington, Seattle, WA, United States ^4Northwest Center for Public Health Practice, Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States ^5Speech, Signal and Language Interpretation (SSLI) Lab, Department of Electrical Engineering, University of Washington, Seattle, WA, United States Anne M Turner, Northwest Center for Public Health Practice, Department of Health Services, University of Washington, Suite 400, 1107 NE 45th Street, Seattle, WA, 98105, United States, Phone: 1 206 491 1489, Fax: 1 206 616 5249, Email: ude.wu@renrutma. Anne M Turner ^1Northwest Center for Public Health Practice, Department of Health Services, University of Washington, Seattle, WA, United States Find articles by Anne M Turner Kristin N Dew ^2Northwest Center for Public Health Practice, Human Centered Design & Engineering, University of Washington, Seattle, WA, United States Find articles by Kristin N Dew Loma Desai ^3Northwest Center for Public Health Practice, Information School, University of Washington, Seattle, WA, United States Find articles by Loma Desai Nathalie Martin ^4Northwest Center for Public Health Practice, Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, United States Find articles by Nathalie Martin Katrin Kirchhoff ^5Speech, Signal and Language Interpretation (SSLI) Lab, Department of Electrical Engineering, University of Washington, Seattle, WA, United States Find articles by Katrin Kirchhoff Author information Article notes Copyright and License information Disclaimer ^corresponding author Corresponding author. ^Contributed equally. Corresponding Author: Anne M Turner ude.wu@renrutma Received 2015 May 29; Revisions requested 2015 Jul 8; Revised 2015 Aug 18; Accepted 2015 Oct 7. Copyright [copyright]Anne M Turner, Kristin N Dew, Loma Desai, Nathalie Martin, Katrin Kirchhoff. Originally published in JMIR Public Health and Surveillance (http://publichealth.jmir.org), 17.11.2015. This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on http://publichealth.jmir.org, as well as this copyright and license information must be included. This article has been cited by other articles in PMC. Abstract Background Chinese is the second most common language spoken by limited English proficiency individuals in the United States, yet there are few public health materials available in Chinese. Previous studies have indicated that use of machine translation plus postediting by bilingual translators generated quality translations in a lower time and at a lower cost than human translations. Objective The purpose of this study was to investigate the feasibility of using machine translation (MT) tools (eg, Google Translate) followed by human postediting (PE) to produce quality Chinese translations of public health materials. Methods From state and national public health websites, we collected 60 health promotion documents that had been translated from English to Chinese through human translation. The English version of the documents were then translated to Chinese using Google Translate. The MTs were analyzed for translation errors. A subset of the MT documents was postedited by native Chinese speakers with health backgrounds. Postediting time was measured. Postedited versions were then blindly compared against human translations by bilingual native Chinese quality raters. Results The most common machine translation errors were errors of word sense (40%) and word order (22%). Posteditors corrected the MTs at a rate of approximately 41 characters per minute. Raters, blinded to the source of translation, consistently selected the human translation over the MT+PE. Initial investigation to determine the reasons for the lower quality of MT+PE indicate that poor MT quality, lack of posteditor expertise, and insufficient posteditor instructions can be barriers to producing quality Chinese translations. Conclusions Our results revealed problems with using MT tools plus human postediting for translating public health materials from English to Chinese. Additional work is needed to improve MT and to carefully design postediting processes before the MT+PE approach can be used routinely in public health practice for a variety of language pairs. Keywords: public health informatics, public health, natural language processing, machine translation, Chinese language, health promotion, public health departments, consumer health, limited English proficiency, health literacy Introduction A key role of public health departments is to inform and educate the public on issues of public health importance. Health departments produce health promotion materials on a range of topics, such as environmental health, communicable diseases, immunizations, and maternal-child health, and the Internet has become a key mechanism by which they distribute and disseminate this information. Although federal and state regulations require that health materials be made available in the languages of patients, due to the time and costs required to manually produce quality translations, very few of these materials are available in languages other than English [1]. Therefore, individuals with limited English proficiency (LEP) have limited access to this health information. This is of particular significance given that LEP status is associated with poor health literacy and negative health consequences, including documented health disparities such as poorer health outcomes and poorer access to health care and preventive services compared to English-speaking minorities [2-4]. Machine translation (MT) ---the automatic translation of text from one human language into another by a computer program ---has been an area of study within natural language processing for several decades. State-of-the-art MT tools use a statistical machine translation (SMT) framework. This approach uses large amounts of parallel text for the desired language pair to train SMT models. During testing, an SMT engine then produces the most likely translation under the statistical model. While MT tools have improved greatly over the last 5 years, and MT is now routinely used by many language service providers, the quality of raw MT output generally falls short of human-generated translations (HT). In order to produce quality translations, MT errors need to be corrected by human readers who have domain expertise and are fluent in the source and target languages. This correction, called postediting (PE), can range from light to heavy editing. It has been shown that MT+PE increases productivity (ie, it can be completed more quickly than producing an entirely new HT) both for translators and for lay users [5]. However, compared with translating, postediting is a cognitively different process, and postediting results are strongly dependent on posteditor skill, attitudes towards machine translation, difficulty of the source document, and quality of the initial machine translation output [5,6]. Our previous research indicates that freely available MT tools, such as Google Translate and Microsoft Translator, can be used in conjunction with human PE to produce quality translations efficiently and at low cost [7,8]. We compared the time and cost of HT versus MT+PE for Spanish public health documents, using health professionals as posteditors [7]. Posteditors corrected 25 machine-translated public health documents. Pairs of HT and MT+PE were blindly presented to 2 bilingual public health professionals, who were asked to rate which of the translations they preferred. In this blinded rating, the HT and MT+PE were found to be overall equivalent (33% HT preferred, 33% MT+PE preferred, 33% both translations considered equivalent). These previous studies were conducted on a single language pair of English-Spanish. SMT generally works best when the source and target languages have similar sentence structures, as in the case of English-Spanish. In order to assess the broader usefulness of MT technology in public health departments, it is necessary to determine whether these results generalize to a wider set of language pairs, specifically those pairs with very divergent linguistic structures. One such pair, English-Chinese, is of particular interest since Chinese is the second most common language spoken by LEP individuals in the United States, representing 6.1% of the LEP population [9]. We conducted postediting experiments, similar to those conducted for the English-Spanish pair, in order to determine the feasibility (accuracy and efficiency) of using MT+PE for translating public health documents from English to Traditional Chinese. We investigated the types of MT errors occurring in Chinese, the PE time needed to correct them, and the quality of MT+PE compared to HTs, as rated by raters fluent in both English and Traditional Chinese. In this paper, we discuss the results of these investigations and compare them to our previous experiences with the English-Spanish pair. This work contributes to our understanding of the challenges involved in applying the MT+PE approach in a public health setting. Methods Initial Steps We collected 60 health promotion documents from different public health agencies in the United States that had been translated manually (HT) from English to Chinese. Translations were created using the Traditional Chinese character set, as opposed to Simplified Chinese, because this is the form known to most Chinese LEP individuals in the Pacific Northwest region. We identified the types of linguistic errors present in MT from English to Chinese and then conducted the postediting of the translated materials with participants fluent in both languages. Next, we had bilingual public health professionals and laypersons rate the quality of the human versus the MT plus postedited documents. A diagram of the study design is shown in Figure 1. A more detailed description of the specific methods for the linguistic error analysis, postediting and rating studies, and follow-up evaluation is provided below. An external file that holds a picture, illustration, etc. Object name is publichealth_v1i2e17_fig1.jpg Open in a separate window Figure 1 Study Design Overview. Linguistic Error Analysis We collected 60 health promotion documents available in English and Chinese (Traditional) from public health websites in the United States. Websites included those of the Centers for Disease Control and Prevention, New York City Department of Health and Public Health, Minnesota Department of Health, Washington State Department of Health, Department of Public Health - Los Angeles County, and Public Health - Seattle & King County. All Chinese versions of these documents had been translated manually (HT) by health department translators or professional translation vendors. The English versions of the documents were then translated into Traditional Chinese using Google Translate. We developed a categorization scheme for MT errors, and all MTs were annotated based on this scheme by a native Chinese speaker with formal training in linguistics. Subsequently, aggregate error statistics were computed to gain insights into the most frequent error categories: word sense, word order, missing word, superfluous word, orthography/punctuation, particle error, untranslated word, pragmatic error, and other grammar error. Postediting Experiments For the postediting studies, we selected 25 of the 60 health documents that had been machine translated from English to Chinese using Google Translate. To ensure a wide representation of topics, we selected the documents based on the length of the English version (340-914 words) and topic area. From the memberships of local Chinese cultural organizations, 6 Chinese translators were recruited for postediting and screened for language ability and health experience. Posteditors, all native Chinese speakers, were fluent in oral and written Traditional Chinese and English, had varying levels of translation experience, and had prior experience in a health-related field (Table 1). Table 1 Initial postediting and quality rating participants, health, and translation experience. Participant number Role Health background Translation experience P1 Posteditor Pharmacy student Limited ---translating at health fairs P2 Posteditor Social work for Chinese population, including health care support Teaching English as a second language & translating research P3 Posteditor and quality rater Public health researcher 10 years of various translation experience P4 Posteditor Social work for Chinese population, including health care support Translating agency and government publications for distribution to clients P5 Posteditor Public health student None P6 Quality rater (posteditor for follow-up evaluation only) Public health translator DSHS Certified Medical Interpreter Open in a separate window The 25 machine-translated documents were each corrected by at least 2 posteditors in order to permit consistency checks across posteditors and computation of average time, adequacy, and fluency ratings per document. Posteditors used a proprietary MT and postediting tool built for the purpose of this study, as described previously [7]. Each posteditor corrected between four and 21 documents representing common types of public health materials, including informational webpages, agency letters, fact sheets, and brochures. Posteditors were allowed to choose their preferred character input method. One posteditor used a pinyin keyboard called Q9, while the rest used the standard Windows OS pinyin input. The postediting tool displays three versions of the text from left to right in one window: the original English text, the MT, and the editable MT, respectively. When a posteditor clicks the editable MT field to begin editing, a timer starts. The tool saves the total editing time (minus pauses), keystrokes, and a copy of the postedited machine translation. Time and keystroke data were collected for all postedited documents. Due to a posteditor saving error, only 24 of the 25 postedited documents were put out in a readable format and therefore available for rating. Posteditors were given written and verbal instructions to "perform all corrections necessary to ensure that the text (1) is consistent with the grammar rules of Chinese, (2) adequately represents the meaning of the English text, (3) is culturally appropriate (ie, not unintentionally funny or offensive), and (4) preserves the linguistic style of the source document." Posteditors were asked not to alter a correct, appropriate translation simply because it may not correspond to their first choice of translation. In short, they were instructed to correct only as much as necessary and to not rewrite the text. These were the same instructions used in the previous Spanish study. After completing postediting, participants were asked to fill out a questionnaire to rate the adequacy and fluency of each MT+PE on a scale of 1-5. These rating scales are common in human evaluations of machine translation quality [10]. An adequacy of 1 indicated that none of the original meaning of the English source text was retained in the MT, while an adequacy of 5 indicated that all of the meaning was retained. A fluency rating of 1 indicated that the MT was incomprehensible, while a rating of 5 indicated flawless Chinese. The questionnaire also asked participants to describe the common translation errors they found, identify which errors were most difficult to correct, and explain which errors took the longest time to correct. Quality Rating Two public health professionals, blinded to the method of translation, compared the quality of the postedited documents to the quality of the HT documents from the health department websites. The quality raters were asked to rate the MT+PE against HT versions. One rater was a professional public health translator and a Department of Social and Health Services Certified Medical Interpreter at a local clinic; the other was a health researcher (Table 1). They were presented with 20 sets of documents selected from the 24 available, with each set containing an original English text, an HT version of that text, and an MT+PE version of the text. Even though one rater participated in the initial postediting study as well, she did not rate documents that she had encountered while postediting. The documents were not labeled as human- or machine-translated, and the order in which they were presented in each set was randomized. Using a questionnaire, we asked the quality raters to read each set carefully, indicate which of the translated versions they preferred, and describe why they chose that version, based on five dimensions: grammar, adequacy, word choice, cultural appropriateness, and reading level. Follow-Up Evaluation After analyzing the results of the quality rating study, we performed follow-up evaluations of the effects of posteditor expertise, engagement, and instructions on the quality of postedited translations. To assess whether posteditors' public health and translation expertise negatively impacted the quality rating outcome, we asked P6, a highly trained and experienced health translator, to postedit four documents. We then repeated the quality rating procedure with those documents, asking five native Chinese speakers to review them. To test posteditor engagement and whether the instructions to edit only as necessary were problematic, we asked 3 posteditors (P2, P4, and P5) to return to edit a total of 10 more documents, this time with instructions to make as many corrections as needed to ensure the quality of the translation. We again repeated the quality rating procedure with one native Chinese speaker who has public health experience to see whether posteditors given the revised instructions would produce text equivalent to the HTs. Results Linguistic Error Analysis Results from the linguistic error analysis are summarized in Table 2. The left-hand column shows the error type; the right-hand column shows the corresponding frequency of the error type, computed as the percentage of all errors annotated in the total set of 60 documents. For example, word sense errors (errors where the word meaning was translated incorrectly) constituted 40% of all annotated errors. The next most common error types involved word order (22%) and missing words (16%). Table 2 Error categories and their distributions. Error categories Frequency (%) Word sense 40 Word order 22 Missing word 16 Superfluous word 14 Other grammar error 3 Orthography/punctuation 3 Particle error 1 Untranslated word 0.03 Pragmatic error 0.01 Open in a separate window Postediting Experiments The proprietary postediting tool recorded the time taken to postedit each machine-translated document. We analyzed the time taken, by document and by posteditor, and examined posteditors' quality ratings of the initial MT output. A list with descriptions of the source documents is provided in Multimedia Appendix 1. To determine and analyze the amount of time required for postediting, we calculated the number of characters per minute (CPM) for each document and then computed means and standard deviations (SDs) in CPM for each document, using posteditors' recorded times. In addition, we computed means and SDs in CPM for each posteditor (Table 3). This ed us gain insights into potential correlations between postediting time and document topic, length, etc, as well as differences between posteditors (though not all posteditors edited the same number of documents). Table 3 Postediting time, adequacy, and fluency ratings by posteditor. Posteditor Docs postedited, n CPM, mean (SD) Avg. adequacy Avg. fluency P1 9 34.2 (7.3) 4 3.2 P2 21 35.4 (16.2) N/A N/A P3 4 25.8 (10.2) 3 2.5 P4 4 54.3 (40.5) 3.25 3.25 P5 11 54.0 (16.0) 3.875 3.75 P6 4 20.6 (3.7) 1.75 1.625 Open in a separate window The mean CPM per document varied greatly, from 18.5-79.6 CPM (SD 0.03-38.7). The total mean CPM across all documents was 37.8 (SD 10.2). Thus, on average a posteditor corrected approximately 38 CPM, with a variation of around 10 CPMs. The results did not indicate a linear relationship between document length and average postediting time. We also found no relationship between the document type and the average CPM. On average, the posteditors rated the adequacy of the translations at 3.32 (SD 0.90), suggesting that much of the original meaning of the source text was preserved in the MT. Average fluency rating was 3.0 (SD 0.84), which corresponds to a grammar quality level of non-native Chinese. The average adequacy and fluency ratings bore no relationship to the document type or length, but varied greatly by individual posteditor. Interestingly, the posteditors who had more experience with translation and health rated the adequacy and fluency lower than did their less experienced counterparts (Table 3). To investigate the variation in postediting speed for individuals, we calculated the average CPM for each posteditor. As shown in Table 3, the average CPM was 37.4 and the average SD for CPM per document was 15.7. We also found large individual differences in speed among posteditors [11,12]. Posteditors also varied widely in their adequacy and fluency ratings, with a trend indicating an inverse relationship between public health translation experience and ratings; the more experienced posteditors in terms of translation and public health expertise tended to rate the documents they postedited lower than those with less experience (Tables 1 and and33). Errors described by posteditors as difficult to correct, or annoying, included word sense errors and word order errors. Some examples of the errors noted by posteditors are provided in Table 4. Table 4 Posteditor examples of top three error categories. Error category Quotes/examples Word sense "The literal meaning changes when translated into Chinese (eg, lost power/electricity is translated as lost 'energy')" Word order "'...when...can't...' type of sentence doesn't have same structure in Chinese. The order of the words change in Chinese and English in many situations" Missing word "Whenever there is the word 'person' we should mention 'this' or 'that' person, otherwise it is not clear who are we talking about in the sentence." Open in a separate window Quality Rating Unlike our previous experience with English to Spanish translations, in a blind comparison of HT and MT+PE, the quality raters selected the HT document as the preferred version for all 20 documents. Reasons given for the preference were better word order, a more professional reading level, smoother flow, more accurate translated word use, preserved meaning, and cultural appropriateness of the original English document. The reasons the rater gave for rejecting the MT+PE documents were that they did not meet the reading level of the general public, some of the sentences lost the intended meaning, the same words were not translated consistently, awkward word order, and occasionally wrong word translations and awkward word flow. Follow-Up Evaluation In theory, if posteditors have sufficient training, experience, and resources to perform quality postediting, MT+PE documents should be equivalent to HT documents. The feasibility of utilizing MT+PE has been repeatedly demonstrated in various previous studies for a variety of language pairs; it is also a procedure that is widely used by many commercial language service providers. In previous work with the Spanish-English language pair, we found our approach feasible even among lay users with minimal training; these conditions closely mirror the public health context, where resources for training and calibration are limited. There are several potential reasons for the preference for the HT over the MT+PE in this study: Differences in MT Quality Chinese machine translations have a different relative frequency of certain error types and lower quality overall. Compared to our previous studies on English-Spanish [8,13], we found that the Chinese translations had high percentages of word order and word sense errors, which require more cognitive effort to correct [14-16]. Adequacy and fluency also had lower ratings compared to the Spanish translations: adequacy for Chinese was 3.3 compared to 4.2 for Spanish; fluency was 3.1 versus 3.7 for Spanish. It should be noted that these scores are not directly comparable since the the sets of English documents used in these two studies were not identical; however, the differences in scores confirm the common observation in the MT community that MT for English-Chinese is less effective than for English-Spanish. Instructions Provided to Posteditors Posteditors might have misinterpreted the postediting instructions. Specifically, the instruction to "postedit only where necessary" and to not "rewrite" might have led them to produce fewer edits than they would under real-life circumstances. Quality raters observed that the postedited documents often contained very literal word-by-word translations that were perceived as unacceptable. In other language pairs with similar linguistic structures (like English and Spanish), more literal translations may still yield acceptable translation outputs, whereas fluent Chinese requires the translator to depart more strongly from a literal translation. Due to time and resource constraints for this study, as with prior studies, there also was no extensive training and calibration phase for the study participants. Combined with the lower quality of initial MT Chinese versions, the postediting instructions might explain the lesser quality of the postedited Chinese translations as compared to the Spanish translations. Linguistic Expertise of Posteditors Although posteditors were selected for bilingual competence and familiarity with the domain of public health, they did not have to undergo initial language or translation tests to verify their editing abilities. Engagement of Posteditors Posteditors may not have been sufficiently engaged in the task, or they may have optimized for time rather than quality. Different Levels of Quality Control In the postediting, only one round of postediting was performed, followed by the quality rating task. We do not know how many iterations of editing and quality control were applied to the human-generated translations, since they were collected from different sources where the translation processes were not transparent. Our prior investigations into health department translation processes revealed that most of the public health HT documents had been translated in-house or by language service providers who conduct several rounds of postediting and review prior to making them public [7]. Additional Follow-Up In order to ascertain the contribution of these factors to the overall results, we conducted additional follow-up studies investigating the role of posteditor expertise, instruction, and engagement. Expertise To assess whether posteditor expertise played a role in the translation quality, we engaged the services of a public health professional who performed translation for a large metropolitan health department in Washington State (P6). She was given the original set of instructions to correct only as much as needed and to not rewrite the text extensively. She postedited four documents, which were then given as a set and blindly rated against their original human translations by five native Chinese speakers so that each rater reviewed all four documents. Three of the 5 raters selected the human translation over the MT+PE for all four documents; 2 raters rated one of the HT and MT+PE documents as equivalent. Instructions and Engagement To test whether our instructions to postedit only where necessary played a role in the MT+PE ratings, we modified the instructions to emphasize quality and recruited 3 posteditors to come back for another postediting session with the new instructions. The original instructions ---as adapted from the Spanish study ---directed posteditors to not alter a correct translation, even if it was not their first choice; to not engage in extensive rewriting of the text; and to not spend an extended period of time looking up grammar, punctuation, or unfamiliar terminology online. The updated instructions directed posteditors to use as much time and effort as necessary to ensure a high-quality translation. The 3 returning posteditors corrected a total of 10 documents, which were then blindly rated by a quality rater with language and public health expertise. As anticipated, posteditors took longer to produce the MT+PE translations with the updated instructions: P2's average speed dropped from 35.38 CPM to 23.43 CPM, P4's fell from 54.33 to 17.46 CPM, and P5's decreased from 53.96 to 19.69 CPM. The rater chose the manual human translations for 6/10 documents, while rating four as equivalent ---a notable improvement over the original instructions. Discussion Principal Findings Although our prior research on English to Spanish translation indicated that MT+PE could produce translations equivalent in quality for less time and cost, our current study on the English-Chinese language pair showed that maintaining quality through postediting was more problematic. Translation between English and Chinese presents a challenge due to very divergent syntactic structures (eg, topic-comment structure in Chinese vs subject-verb-object structure in English), frequent dropping of pronouns in Chinese, higher degree of morphology in English, and other linguistic differences. Compared to a language pair like English and Spanish, SMT for English and Chinese generally tends to produce lower-quality results (eg, the results obtained in benchmark evaluations for different language pairs conducted by the US National Institute of Standards and Technology [17]. Strengths and Limitations Although, theoretically, professional translators with sufficient training and time should be able to produce an equivalent product through postediting MTs, even with instructions to take the time to provide the best quality translation, the final postedited translations still contained obvious errors that led the quality raters to prefer HTs in most cases. Experienced translators who performed the translations rated the adequacy and fluency of MT+PE lower in general than their less experienced counterparts and commented that for many machine-translated sentences it would be easier to start with the English version than to correct the MT version. However, it should be noted that our prior evaluation of health department translation processes found that HT documents undergo multiple editing cycles to ensure translation quality and cultural appropriateness. In the studies reported on here, the machine-translated documents underwent only one round of postediting. It is likely that with additional rounds of editing the MT+PE product would be further improved. Another possible limitation of our study is the use of a single translation engine, Google Translate. However, most SMT systems are based on the same set of underlying statistical models, suggesting that the types and relative frequencies of translation errors would not have been significantly different had a different SMT system been used. Additional work is needed to improve the quality of MT from English to Chinese. Word sense and word order errors require the most attention for improvement. Our team is currently working to improve these errors. In addition, particular care must be taken in selecting posteditors, documents, and machine translation engines, and in designing postediting instructions and quality control processes. Conclusion In the United States, Chinese is the second most common language spoken by LEP individuals and the single most common character language used. However, due to the resources and time involved in human translation, health departments currently offer few health promotion materials in Chinese. Our investigation into the use of MT+PE to produce translations indicates that using the methods that worked for English to Spanish translations was not as effective with translation from English to Chinese. Multiple factors, including quality of MT and expertise of posteditors, may have contributed to these results. Our preliminary follow-up studies suggest that reducing word sense errors and word order errors would improve English to Chinese MTs, while additional training and expertise of bilingual posteditors may be needed in order to successfully apply online MT technology to public health practice. We are performing additional studies to determine how best to improve translation from English to Chinese in order to ensure quality translation at a low cost. Acknowledgments The research reported here was supported by the National Library of Medicine of the National Institutes of Health (NIH) under award number R0110432704. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The images used in Figure 1 were created by Hadi Davodpour, Edward Boatman, and Lauren Manninen for the Noun Project. We would also like to thank Beryl Schulman and Julie Loughran for reviewing this manuscript. Abbreviations CPM characters per minute HT human translation LEP limited English proficiency MT machine translation NIH National Institutes of Health PE postediting SMT statistical machine translation Multimedia Appendix 1 Study source documents and postediting times. Click here to view.^(41K, pdf) Footnotes Conflicts of Interest: None declared. References 1. Turner A, Capurro D, Kirchhoff K. 3rd Annual Health Literacy Research Conference. Chicago, IL: Health Literacy Research Conference; 2011. [2015-05-19]. The availability of translated public health materials for limited English proficiency populations in Washington State http://www.bumc.bu.edu/healthliteracyconference/files/2011/07/Poster-Ab stracts-Packet.pdf webcite. 2. Raynor EM. Factors Affecting Care in Non-English-Speaking Patients and Families. Clin Pediatr (Phila) 2015 May 11; doi: 10.1177/0009922815586052. [PubMed] [CrossRef] 3. Ponce NA, Hays RD, Cunningham WE. Linguistic disparities in health care access and health status among older adults. J Gen Intern Med. 2006 Jul;21(7):786-91. doi: 10.1111/j.1525-1497.2006.00491.x. http://europepmc.org/abstract/MED/16808783. [PMC free article] [PubMed] [CrossRef] 4. Sentell TL, Tsoh JY, Davis T, Davis J, Braun KL. Low health literacy and cancer screening among Chinese Americans in California: a cross sectional analysis. BMJ Open. 2015;5:1-9. [PMC free article] [PubMed] 5. Aranberri N, Labaka G, Diaz de Ilarraza A, Sarasola K. Comparison of Post-Editing Productivity between Professional Translators and Lay Users. Third Workshop on Post-editing Technology and Practice; Third Workshop on Post-editing Technology and Practice; October 2014; Vancouver (BC), Canada. 2014. Oct, pp. 20-33. http://www.amtaweb.org/AMTA2014Proceedings/AMTA2014Proceedings_PEWorksh op_final.pdf. 6. Koehn P, Germann U. The impact of machine translation quality on human post-editing. Workshop on Humans and Computer-Assisted Translation; Workshop on Humans and Computer-assisted Translation; 2014; Gothenburg, Sweden. 2014. pp. 38-46. http://www.aclweb.org/anthology/W14-0307.pdf. 7. Turner AM, Bergman M, Brownstein M, Cole K, Kirchhoff K. A comparison of human and machine translation of health promotion materials for public health practice: time, costs, and quality. J Public Health Manag Pract. 2014;20(5):523-529. doi: 10.1097/PHH.0b013e3182a95c87. [PMC free article] [PubMed] [CrossRef] 8. Kirchhoff K, Turner AM, Axelrod A, Saavedra F. Application of statistical machine translation to public health information: a feasibility study. J Am Med Inform Assoc. 2011;18(4):473-478. doi: 10.1136/amiajnl-2011-000176. http://jamia.oxfordjournals.org/cgi/pmidlookup?viewlong&pmid21498805. [PMC free article] [PubMed] [CrossRef] 9. Pandya C, Batalova J, McHugh M. Limited English Proficient Individuals in the United States: Number, Share, Growth, Linguistic Diversity. Migration Policy Institute; 2011. [2015-05-17]. http://www.immigrationresearch-info.org/report/migration-policy-institu te/limited-english-proficient-individuals-united-states-number-share- webcite. 10. Linguistic Data Consortium Linguistic Data Annotation Specification: Assessment of Fluency and Adequacy in Translations Revision 1.5. 2005. Jan 25, [2015-05-18]. https://www.ldc.upenn.edu/collaborations/past-projects webcite. 11. Guerberof A. Machine Translation Summit XII. 2009. Aug, [2015-05-18]. Productivity and quality in MT post-editing http://www.mt-archive.info/MTS-2009-Guerberof.pdf webcite. 12. Guerberof A. Correlations between productivity and quality when post-editing in a professional context. Machine Translation. 2014 Nov 20;28(3-4):165-186. doi: 10.1007/s10590-014-9155-y. [CrossRef] 13. Kirchhoff K, Capurro D, Turner AM. A Conjoint Analysis Framework for Evaluating User Preferences in Machine Translation. Mach Transl. 2014 Mar 1;28(1):1-17. doi: 10.1007/s10590-013-9140-x. http://europepmc.org/abstract/MED/24683295. [PMC free article] [PubMed] [CrossRef] 14. Temnikova I. Cognitive Evaluation Approach for a Controlled Language Post-Editing Experiment. Proceedings of the Seventh International Conference on Language Resources and Evaluation; Seventh International Conference on Language Resources and Evaluation; May 2010; Valletta, Malta. 2010. May, pp. 3485-3490. http://www.lrec-conf.org/proceedings/lrec2010/pdf/437_Paper.pdf. 15. Lacruz I, Denkowski M, Lavie A. Cognitive Demand and Cognitive Effort in Post-Editing. Third Workshop on Post-Editing Technology and Practice; The Third Workshop on Post-Editing Technology and Practice; October 2014; Vancouver (BC), Canada. 2014. Oct, pp. 73-84. http://www.amtaweb.org/AMTA2014Proceedings/AMTA2014Proceedings_PEWorksh op_final.pdf. 16. Koponen M, Aziz W, Ramos L, Specia L. Post-Editing Time as a Measure of Cognitive Effort. Workshop on Post-Editing Technology and Practice; Tenth Biennial Conference of the Association for Machine Translation of the Americas; October 2012; San Diego, California. 2012. Oct, http://amta2012.amtaweb.org/AMTA2012Files/html/13/13_paper.pdf. 17. Koehn P. Options, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the ACL. 2010 Annual Conference of the North American Chapter of the ACL; June 2010; Los Angeles, CA. 2010. Jun, pp. 537-545. http://www.aclweb.org/anthology/N10-1078. __________________________________________________________________ Articles from JMIR Public Health and Surveillance are provided here courtesy of JMIR Publications Inc. Formats: Article | PubReader | ePub (beta) | Printer Friendly | Citation Share Share on Share on Share on Google Plus Google+ Support Center Support Center External link. Please review our privacy policy. NLM NIH DHHS USA.gov National Center for Biotechnology Information, U.S. National Library of Medicine 8600 Rockville Pike, Bethesda MD, 20894 USA Policies and Guidelines | Contact statistics We deliver high quality translations that bring your products and services to new countries faster than ever before. Highest Quality Translations Lilt’s AI works alongside translators, bringing a level of quality that was previously impossible. Translations done by domain experts Our translators have ed some of the greatest companies in the world break into new markets. Enterprise-level security Your data are always encrypted, and never shared with anyone. See why top companies choose Lilt Zendesk “We didn’t want an off-the-shelf solution. We needed something we could customize as much as possible to our own vocabulary, and that could instantaneously learn as we went along with our human and machine translations.” [melissa-burch.19e3d49.jpg] Melissa Burch ZenDesk See Case Study Knowledge should be universally accessible. Ours is. Browse Resources Whitepaper Machine Translation Evaluation With so many competing technologies, how can you be sure any one solution will set your business up for success? Read More Webinar Intro to Neural Machine Translation Neural Machine Translation is everywhere. What are the advantages of using it over existing technologies? Glad you asked. Read More Become a Translator Work with the top 1% of translators in the world. Translate where your domain expertise is appreciated, thanks to our predictable project pipeline, fast payment terms, and industry-leading software. Oh, and you’ll never be asked to post-edit ever again. Apply Here Get Paid Faster Predictable Work Better Technology [placeholder_01.a025896.21324f8.jpg] As Seen In [2000px-TheEconomistLogo.67438d8.png] January 15, 2017 Machine Translation: Beyond Babel View Article [wsj.5aaaebc.png] August 19,2018 Models Will Run The World View Article [2000px-BBC_World_Service_red.c53f8ea.png] May 27, 2018 From Language to Algorithm View Article [wired.aeee8ce.png] November 15, 2017 Welcome To The Era of The AI Co-worker View Article [inc_magazine_logo.87317f9.png] January 28, 2018 20 of Mark Benioff’s Best Startup Investments View Article Ready to get started? Let us show you how Lilt can bring your products and services to new countries. Request a Demo System Status Support Terms Privacy Careers 2019 Lilt, Inc. Skip to main content (BUTTON) Toggle navigation EN 601.468/668 Machine Translation Syllabus Homework 1. 1. Quality of Machine Translation 2. 2. Word Alignment 3. 3. Decoding 4. 4. Neural Machine Translation 5. 5. Neural Machine Translation Final Project Language in 10 minutes EN 601.468/668 Machine Translation Fall 2018 Tuesdays and Thursdays 1:30-2:45 Ames 234 Computer Science Department Johns Hopkins University Google translate instantly translates between any pair of over eighty human languages like French and English. How does it do that? Why does it make the errors that it does? And how can you build something better? Modern translation systems like Google Translate and Bing Translator learn to translate by reading millions of words of already translated text. This course will show you how they work. We cover fundamental building blocks from linguistics, machine learning (especially deep learning), algorithms, and data structures, showing how they apply to a difficult real-word artificial intelligence problem. Instructor Philipp Koehn (phi@jhu.edu) TA Huda Khayrallah (huda@jhu.edu) Brian Thompson (brian.thompson@jhu.edu) Tanay Agarwal (tagarwa2@jhu.edu) Office hours Professor by Appointment TAs Monday 10-12, Barton 225; Tuesday 10:30-11:30, Malone Undergraduate Lab Discussion Forum Piazza Textbooks The class follows closely two textbooks. + Statistical Machine Translation (errata) by Philipp Koehn, 2010. You can read it online through the JHU library or purchase from Amazon. + Neural Machine Translation, by Philipp Koehn, 2019. A draft copy of the book will be distributed by email. Contact the professor to receive a copy. Grading To understand how machine translation works, you will build a translation system. We will mainly grade hands-on work. Five homework assignments (12% each) Final project (30%) In-class presentation: Language in ten minutes (10%) Homework Schedule There will be five homework assignments, tentative schedule: + HW1: Analysis, due September 13 + HW2: Word alignment, due September 27 + HW3: Decoding, due October 11 + HW4: Neural translation model part 1, due October 25 + HW5: Neural translation model part 2, due November 8 Late penalty for homework assignments: 10% per day. __________________________________________________________________ Last updated November 30, 2018. Created with git, jekyll, bootstrap, and vim. Feel free to reuse the source code . Matecat » Feed Matecat » Comments Feed Matecat » iCal Feed alternate alternate (BUTTON) Matecat Benefits Outsource Plans About FAQ Support Webinar Machine Translation Engines Support > Managing Language Resources > Machine Translation Engines In MateCat, the best option when creating a project is to select MyMemory, which uses a combination of Google Translate and Microsoft Translator to provide machine translation suggestions. You can also disable machine translation suggestions by unchecking the corresponding box under the field “Use in this project” in the Machine Translation tab. You can also connect your machine translation engines provided by MMT, Microsoft Translator Hub, IPTranslator from Iconic, Tilde MT, Apertium, AltLang, Yandex.Translate, Tauyou, SmartMate and Deeplingo or your own Moses engines directly from the MateCat online CAT tool. All you need are the credentials granted by your machine translation provider. In order to enable them, click on Options on the home page, then on Add MT engine and select the engines from the dropdown menu Machine Translation. The same steps can be taken from the Language Resources panel. Find out more on this topic in the specific section of the FAQ. Was this article ful? 10 5 In this topic Manage Your Language Resources Exporting Private Translation Memories How to Add a Glossary Public TM and Translation Memory Key (TM Key) TM Backup and Updates Translation Memory and Machine Translation Machine Translation Engines All topics Introducing MateCat Creating Projects Analysing Outsourcing to Translated with MateCat Translating Projects Revising Projects Managing Language Resources Advanced Features Get free support Click on the green Get Support button and send us an email. We’ll get back to you in a few working hours Monday to Friday, 9.00 am to 7.00 pm GMT+1. Still in doubt? your question [master_matecat.png] Connect MateCat is an enterprise-level, online CAT tool which makes post-editing and outsourcing easy Enterprise Users? Contact Us MateCat is used by large enterprises not just as a CAT tool, but also as a platform to build innovative services and tools. We provide software customization, hosting, dedicated support etc. for companies, organizations and translation agencies with specific requirements. Contact us for more information. How to Use Machine Translation to Localize UGC for Global Websites How to Use Machine Translation to Localize UGC for Global Websites Is the use of machine translation evil for SEO? In terms of global website content translation or localization, the best practice is to have content localized professionally by a native speaker. However, just like everything else, there’s a best practice, and there’s the reality of conducting business. So, what is the reality of running a global website? How does the best practice apply – or not apply – especially when it comes to user-generated content (UGC)? The Challenge of Content Localization for Global Websites One of the real-life situations that businesses deal with is the challenge of increasing user engagement without negatively impacting SEO performance. Site owners agonize over following the best practices for their fixed content on the website, but due to the speed and/or the costs of professional translation, oftentimes, it prohibits them to apply this best practice to UGC translation. Because of this challenge, I often see global websites showing UGC left in English or the source language on their local sites because they are trying to follow the SEO best practice. I understand that website owners are concerned about the SEO implications of machine translation. However, when content is not translated into the local language, it won’t site visitors or website owners. Let’s go through this challenge step by step to see if we can find some middle ground. Selecting Content for Machine Translation Before we deep dive into the topic, I’d like to clarify that this article is specific to user-generated content, and not the entire website. Fixed content should always be translated and localized professionally by humans without exception. Page headers and commonly used text, such as column labels, should also be localized and checked by humans. If you don’t want UGC to rank well in the search results or even be indexed by search engines, that is the safest area to implement machine translation. The user comments, feedback, reviews, etc. which are not the main content of the page can easily be handled with machine translation. Even if the translation is not perfect, it would provide ful information to site visitors when they can read it in their languages. How to Use Machine Translation to Localize UGC for Global Websites How to Use Machine Translation to Localize UGC for Global Websites When the UGC is on the pages you wish to be indexed by the search engines and perform well in the organic search results, you need to determine the best translation solution. Crowdsourced Translation This is not machine translation, but another option that some websites use to localize their content. It usually has a database of words, which participants access to add the words in other languages. It’s a low-cost solution when you have volunteers to do the translation work. Wikipedia probably is the largest global website using this solution. Because it depends on crowd participation, it comes with some concerns. It is difficult to maintain the quality of the translation. Some languages may take much longer to generate a large enough database to translate content. This becomes a bigger issue when the source language is not one of the more widely spoken/read languages. Some machine translation tools let you create a glossary database by words and phrases translated by crowdsourced translation. Below is an example of a clearly wrong word showing up in Google’s Translation Tool. When a Japanese word for “mischievous” was entered, it gave an incorrect translation in English. (The translation has been corrected since then.) Translation of Mischievous from Japanese to English Translation of Mischievous from Japanese to English In order to control the quality of the translation and minimize problems, I suggest that you control who can contribute to the translation project by giving tool access only to trusted editors. The Advancement of Machine Translation with AI As machine translation technology has advanced with AI, some websites – including large global websites such as – are implementing the Neural Machine Translation (NMT). On the site, you can see this real-time, text-to-text translation working on posts and comments. On their “code.fb.com” site, they state: “We have just started being able to use more context for translations. Neural networks open up many future development paths related to adding further context, such as a photo accompanying the text of a post, to create better translations. We are also starting to explore multilingual models that can translate many different language directions. This will solve the challenge of fine-tuning each system relating to a specific language pair, and may also bring quality gains from some directions through the sharing of training data.” Other companies, including Google and Microsoft, also offer NMT solutions for websites and other translation needs. In addition to text translation, Microsoft developed the Automatic Speech Recognition (ASR) for audio speech translation currently used for Skype. How to Use Machine Translation to Localize UGC for Global Websites How to Use Machine Translation to Localize UGC for Global Websites Improve the Quality of the Translation Even with certain advancements, the fact is that machine translation is not perfect just yet. That said, machine translation quality has improved significantly, especially for Western languages. The following are some things you can do to ensure the quality of the translation: Create a list of commonly used words (e.g., categories, tags, product names, other keywords). Get them translated professionally or even in-house. Upload the list to the translation engine. Spot check the translation from time to time to ensure the quality of translated content. Add online dictionary using their API. B2B Industry specific machine translation can handle industry-specific jargon and words better. Optimize the Machine Translation Engine Integrate translation management system (TMS) environments for machine translation engine implementation. Customize the machine translation engine for the content type. Create training data for AI and machine learning. Still concerned about using the machine translation in terms of SEO? Here’s a comment on machine-translated content by Google’s John Muller: “I think the kind of the improvements that are happening with regards to automatically translated content… It could also be used by sites that are legitimately providing translations on a website and they just start with like the auto translated version and then they improve those translations over time. So that’s something where I wouldn’t necessarily say that using translated content like that (spamming content) would be completely problematic but it’s more a matter of the intent and kind of the bigger picture what they’re doing.” Many websites already use machine translation for their global sites. Their content is indexed and could perform well by providing quality content for their local audiences. Indeed, it comes back to the “intent” Mueller spoke about. Translating UGC to provide informative content to your local audience falls under “a good intent.” Conclusion Machine translation could be a great solution for some global websites, specifically for handling large volumes of user-generated content. Making reviews and comments available in different languages can significantly increase visitor satisfaction, engagement, and (most importantly) sales. Don’t let broad standards keep you from serving your consumers. Review the following and make the best decision for your business. Determine the content on your site that is appropriate for machine translation. Select the translation solution that works best for your website content. Optimize the machine translation engine by adding industry-specific terms, keywords, etc. Create training data for AI. Monitor the quality of the translation. More Resources: 5 Content Management Tips for Global Websites A Quick Guide to Getting Started in International SEO A Complete Guide to SEO __________________________________________________________________ IFRAME: //www.googletagmanager.com/ns.html?idGTM-KCGXRMR Skip to main content MIT Press Search Menu Close Menu Machine Translation By Thierry Poibeau A concise, nontechnical overview of the development of machine translation, including the different approaches, evaluation issues, and major players in the industry. Paperback $15.95 T £11.95 Add to Cart Buying Options Buying Options Buy + Paperback $15.95 | £11.95 Toggle Dropdown ISBN: 9780262534215 296 pp. | 5 in x 7 in 28 b&w illus. September 2017 + + Amazon.com Buy + Barnes & Noble Buy + IndieBound Buy + Indigo Buy + Powell's Buy + Waterstones Buy Close Drawer Request Permissions Online Attention Mouseover for Online Attention Data Overview Author(s) Summary A concise, nontechnical overview of the development of machine translation, including the different approaches, evaluation issues, and major players in the industry. The dream of a universal translation device goes back many decades, long before Douglas Adams's fictional Babel fish provided this service in The Hitchhiker's Guide to the Galaxy. Since the advent of computers, research has focused on the design of digital machine translation tools—computer programs capable of automatically translating a text from a source language to a target language. This has become one of the most fundamental tasks of artificial intelligence. This volume in the MIT Press Essential Knowledge series offers a concise, nontechnical overview of the development of machine translation, including the different approaches, evaluation issues, and market potential. The main approaches are presented from a largely historical perspective and in an intuitive manner, allowing the reader to understand the main principles without knowing the mathematical details. The book begins by discussing problems that must be solved during the development of a machine translation system and offering a brief overview of the evolution of the field. It then takes up the history of machine translation in more detail, describing its pre-digital beginnings, rule-based approaches, the 1966 ALPAC (Automatic Language Processing Advisory Committee) report and its consequences, the advent of parallel corpora, the example-based paradigm, the statistical paradigm, the segment-based approach, the introduction of more linguistic knowledge into the systems, and the latest approaches based on deep learning. Finally, it considers evaluation challenges and the commercial status of the field, including activities by such major players as Google and Systran. Paperback $15.95 T | £11.95 ISBN: 9780262534215 296 pp. | 5 in x 7 in 28 b&w illus. September 2017 Share Share Authors Thierry Poibeau Thierry Poibeau is Director of Research at the Centre National de la Recherche Scientifique in Paris, Head of the LATTICE (Langues, Textes, Traitements InformatiquesCognition) Laboratory, and Affiliated Lecturer in the Department of Theoretical and Applied Linguistics at the University of Cambridge. Other Books in this Series See More Food Food Fabio Parasecoli Buying Options Sexual Consent Sexual Consent Milena Popova Buying Options You might also like Haptics Haptics Lynette Jones Buying Options MIT Press Footer Books Journals Blog Podcasts alternate alternate alternate alternate alternate alternate alternate Unbabel » Feed IFRAME: https://www.googletagmanager.com/ns.html?idGTM-M77VLBR (BUTTON) Solutions Customer Service Increase customer satisfaction, cut down response times, and build a more efficient operation. Unbabel for Zendesk Get multilingual with Zendesk Support, Chat and Guide Unbabel for Freshdesk Deliver customer support in 28 languages on Freshdesk Unbabel for Salesforce Seamless translation solutions for Service Cloud, Knowledge, and Live Agent Unbabel for Video Your one-stop shop for high quality transcription, translation and subtitling APIs Translation API Video API Pricing Blog Request a demo Become a translator The world’s only human-quality translation pipeline Request demo API documentation Trusted by Microsoft Change.org Pinterest Soundcloud Under Armour King AI + Human Translation API Get human-quality translations of your content piped where you need it Continuous translation Unbabel can translate all your content seamlessly 50,000+ editor community AI assisted Human Translators around the globe translate your content Final human touch Human quality is enforced by skilled professionals before delivery AI, Glossaries and Style Guides AI assisted guidance ensures translation quality and speed in every step [unbabel-apipage-usp-01.svg] State-of-the-art Translation Unbabel employs custom Machine Translation engines using state-of-the-art Neural Machine Translation (NMT) adapted to our customers’ domains. [unbabel-apipage-usp-02.svg] 50,000+ editor community We work with a Community of professional translators and native speakers. They’re on the move, around the globe, working on the Unbabel Platform on their computers and mobile phones. [unbabel-apipage-usp-03.svg] Glossaries & style guides Customer glossaries and style guides assure quality and consistency with your brand’s voice in every translation. [unbabel-apipage-usp-04.svg] The world’s best Quality Estimation system Unbabel has the world’s most state of the art Quality Estimation system, winning multiple shared tasks at the Workshop on Machine Translation by wide margins. We use it to rank our translations and to identify incorrect words for our editors to pay special attention to. [unbabel-apipage-usp-05.svg] Better than out-of-the-box Google, Microsoft and Yandex We incorporate customer-specific training data, machine translation engines adapted by content type, and a host of machine learning algorithms to beat out-of-the-box MT solutions from some of the biggest names in tech. [unbabel-apipage-usp-06.svg] Developers are welcome With a fully functional SDK for Python, and SDKs for Ruby and PHP in development, you can put Unbabel to work for you as quickly as possible. Learn more. See it in action IFRAME: https://www.youtube.com/embed/zGQOLW9KJxo?rel0&showinfo0 Use Cases The Unbabel Translation API seamlessly integrates with your workflows, business processes, websites, apps, comms platforms and more. Your platform, multilingual Build multilingual platform integrations, like we’ve done for Salesforce, Zendesk and more. CMS translation on the go Translate within your CMS so you can easily publish new content in multiple languages. Build and partner with Unbabel Develop an Unbabel Integration with another platform and list it on our Marketplace. Reach customers in their native language with Unbabel Request demoAPI documentation Company About Publications Careers Press & Media Portugal 2020 – Project 10432 Portugal 2020 – Project 027767 Why Unbabel Translation Quality Translation Speed Languages Customers Translators Developers Use Cases Multilingual Customer Service Multilingual Live Chat Support Customer Support Translator Support API Documentation Terms of Service Privacy Policy Contact English + Español + Português + Italiano + Deutsch + Français + 简体中文 Unbabel Building universal understanding Global Content & Language Solutions - AMPLEXOR Blog | Machine Translation Machine Translation | Je veux lirepropos de Translation | Translation Management | Localization | Language Services | Globalization | Internationalisation | Le rôle de la technologie de traduction dans une communication réussie by Kristina Bauer | 4min lis | 11/05/17 Lire la suite Abonnez-vousnotre blog TOPICS CAT Tools (1) Centralisation (1) Collaboration (1) Content Creation (1) contenu dans le secteur industriel (1) Coûtservices linguistiques (1) Coûts de traduction (1) Customer Experience (1) Différences culturelles (1) Gestion de la Traduction (1) gestion du programme de traduction (1) Globalization (3) Internationalisation (2) L'intelligence artificielle (1) Language Services (4) tendances de l'AI (1) étapes de l'IA (1) localisation (2) Localisation de sites Internet (1) Localization (5) Machine Translation (1) Marketing Global (1) Multilingual Content (2) programme de traduction (2) Project Management (1) Qualité de la traduction (1) qualité de traduction (1) Règlement PRIIP (1) SEO Multilingue (1) Stratégie de Localisation (2) Terminologie (2) Terminology (1) Traduction (1) Traduction Automatique (2) Traduction commerciale (1) traduction créative (1) traduction de contenu (1) Traduction de logiciels (1) Traduction de sites Internet (2) Transcréation (2) Translation (7) Translation Industry (2) Translation Management (6) Translation Memory (1) Translation Strategy (1) Translation Technology (1) validation dansmarchés (1) Website Optimization (1) More topics alternate alternate alternate alternate alternate alternate alternate Pure Neural™ Machine Translation Le moteur de traduction neuronale de SYSTRAN Testez-le - En savoir plus Lʼintelligence artificiellele deep learning appliqués au traitementlangues Traduction neuronale : revenons aux origines mots "Deep Learning" ou apprentissage profond ou encore réseaux de neurones artificiels, ne vous sont sûrement pas inconnus. Nous sommes nombreuxavoir utilisé sans le savoirsolutions basées sur cette technologie comme la reconnaissance dʼimages, lʼanalyse de big data et assistants virtuels quegéants du Web ont intégrésleurs services. Plus récemment, de nombreuseses recherches ont été menées sur lʼapport de ces nouvelles technologies dans le traitement de la langue. Les résultats de ces recherches sont partagés au sein dʼune communauté open source dans laquelle SYSTRANtrès activepartage ses connaissances. Une machine auto-apprenante Contrairement aux technologies jusquʼalors utilisées sur le marché (statistiqueà base de règles),moteur neuronal traite la totalité du processus de traduction automatique au travers dʼun unique réseau de neurones artificiels. réseau de neurones artificielscomposé de plusieurs couches qui sont connectées entre elles avecpoids différents appelée les paramètres du réseau. élément clé du réseau de neuronessa capacitécorriger automatiquement ses paramètres pendant la phase dʼapprentissage (quelques semaines). Concrètement, ce quigénéré en sortie est comparéune traduction de référenceen retourcorrectif est "rétro-propagé" pour ajusterpoidsaffiner le paramétrage des connections du réseau. Cette technologie qui se base suralgorithmes complexesla pointe du Deep Leraning (ou apprentissage profond) permet au moteur PNMT™ (Pure Neural™ Machine Translation) dʼapprendre, de générerrègles dʼune languepartir dʼune traduction de référencede produire une traduction dont la qualité dépasse lʼétat de lʼartsʼavère meilleure que celle dʼune personne non native de la langue. Demander une demo Nous respectons la confidentialité de vos informationsnous ne les utiliserons que dans le cadre de nos échanges. Comment mettre en place avec succèstechniques de managementflux tenduéchelle mondiale PNMT page form-pnmt-page-request Aiderentreprisesréussir dans lʼère de lʼinstantanéité Pure Neural Machine Translation Démonstrateur A lʼère du digital, la barrière de la langue représente jusquʼà ce jour unplus grands défis pour déployer rapidement une stratégie commerciale internationale.entreprises ont aujourdʼhui lʼopportunité de toucher plus facilement de nouveaux marchés grâce aux derniers progrès de lʼintelligence artificielleaux avancées de la recherchedéveloppement en traduction automatique. Avec cette innovation majeure, SYSTRAN poursuit sa quête de lʼexcellence technologique dans le but dʼaiderentreprisesles organisationsse donnermoyens de réussir dansmonde de communication globale avecexigences de disponibilité 24/7de réactivité en temps réel. SYSTRAN offre aux organisationsaccèsla meilleure qualité de traduction du marché, proche de la fluidité dʼune traduction humaineadaptée aux spécificités de chaque clientà son domaine (légal, automobile, IT, tourisme...). entreprises peuvent ainsi déployer leur stratégie commerciale dans plusieurs pays simultanément, en dépassant la barrière de la langue et en apportantgains substanciels de productivitéde délai de mise sur le marché. Témoignages Témoignages « La nouvelle technologie PNMT™ offre une qualitéune fluidité de traduction inégalée dans l‘histoire de la traduction automatique. Il reste cependantaxes d‘amélioration sur lesquelséquipes R&D SYSTRAN sont déjà en train de travailler. Nul doute que cette technologie ouvrira de nouvelles perspectives aux traducteursaux collaborateurs dansmonde globalisé. » Crosslang Heidi Depraetere, fondatrice de Crosslang « Le moteur de traduction PNMT™ créé par SYSTRAN estgrand pas pour la communication en généralpour le tourisme en particulier. Il apportefabuleux champ d‘opportunités, de nouvelles expériences, un voyage passionnant en terrelangues ! Le touriste augmenté 2.0 est né ! » Petit Futé Dominique Auzias, fondateur du Petit Futé « Mieux encore, le moteur SYSTRAN PNMTTM comprenait ce que je voulais dire,lʼa traduit de manière très fluide. Dans la plupartcas, la terminologie était adéquate,les phrases « sonnaient » comme des phrases humaines. » Lexcelera Lori Thicke, PDG de Lexcelera La spécialisation décuple le potentiel de la traduction neuronale SYSTRANle seul acteur aujourdʼhui capable de spécialisermoteur neuronal. Ce savoir-faire unique améliore nettement la qualité de traduction danstemps record. [pnmt-systran-ceo-jean-senellart.jpg] « L‘adaptation de la traduction àdomaine spécifique : juridique, marketing, légal, technique…une nécessité absolue pour les entreprisesorganisations globales. Offrir aux professionnels des solutions de traduction spécialisée dans leur terminologie métier, là l‘ADN de SYSTRAN depuis de nombreuses années. La nouvelle génération de moteurs neuronaux ouvrent de nouvelles possibilités de spécialisation. PNMT™capable dʼadaptermodèle génériquede nouvelles donnéesmêmechaque traducteur. La traduction neuronale générique apporte sans contestesaut quantique dans lʼhistoire des technologies de traduction, mais la traduction neuronale spécialisée celle qui permettra réellement aux organisations dʼatteindre leurs objectifslʼinternational. » Jean Senellart, Directeur Technique de SYSTRAN Ressources Trouverrevendeur Desktop revendeur SYSTRANà votre service dans votre région pour trouver la solution Desktop qui vous convient. Trouverrevendeur Desktop Suivez-nous Nous respectons la confidentialité de vos informationsnous ne les utiliserons que dans le cadre de nos échanges. Politique relative au respect de la vie privée - Conditions dʼUtilisationServices de SYSTRAN - Copyright 2019 SYSTRAN All rights reserved - Traduction en ligne, logiciel de traductionoutils de traduction: Traduction de texte, de pages web, de fichiers. Dictionnaire multilingue en ligne. Logiciel de traduction disponible en Anglais, Français, Italien, Allemand, Portuguais, Espagnol, Néerlandais, Grec, Chinois, Japonais, Coréen, Russe, Polonais, Arabe et Suédois. Traduction de messagerie instantanée (MS Lync). www.systranlinks.com Touschamps sont requis alternate Appen » Feed Appen » Comments Feed alternate alternate Appen High-quality training data for machine learning, enhanced by human interaction Navigation ABOUT + Learn about Appen + o Leadership o Investors + o Careers o Locations + o Awards & Recognition o Contact Us + [blog_menu_img.png] INDUSTRIES + Industries we serve + o Technology o Retail + o Automotive o Healthcare + o Government o Financial Services + [events.png] SOLUTIONS + Solutions we improve + o Applications o Automatic Speech Recognition o CEM/CRM o Computer Vision o Data Analytics o eCommerce + o Fraud Detection o In-car Infotainment o In-car Navigation o Machine Translation o Medical Imaging o Risk Management Models + o Proofing Tools o Search Relevance o Semantic Search o Social Media o Social Media Analytics o Text-to-Speech o Virtual Assistants and Chatbots + [machine-learning-menu-banner.png] SERVICES + Services we offer + o Annotation o Data Annotation o Linguistic Annotation o Semantic Annotation o Collection o Image and Video Data o Speech Data o Text Data o Consultative Services + o Content Moderation o Field Testing o Linguistics o Language Technology QA o Lexicons and Word Lists o Linguistic Consulting o Linguistic Rule Development o Personalization + o Search Relevance o Secure Services o Transcription o Secure o Speech Data o Translation and Localization + [solutuons_menu_image.png] Find flexible jobs ABOUT + Learn about Appen + o Leadership o Investors + o Careers o Locations + o Awards & Recognition o Contact Us + [blog_menu_img.png] INDUSTRIES + Industries we serve + o Technology o Retail + o Automotive o Healthcare + o Government o Financial Services + [events.png] SOLUTIONS + Solutions we improve + o Applications o Automatic Speech Recognition o CEM/CRM o Computer Vision o Data Analytics o eCommerce + o Fraud Detection o In-car Infotainment o In-car Navigation o Machine Translation o Medical Imaging o Risk Management Models + o Proofing Tools o Search Relevance o Semantic Search o Social Media o Social Media Analytics o Text-to-Speech o Virtual Assistants and Chatbots + [machine-learning-menu-banner.png] SERVICES + Services we offer + o Annotation o Data Annotation o Linguistic Annotation o Semantic Annotation o Collection o Image and Video Data o Speech Data o Text Data o Consultative Services + o Content Moderation o Field Testing o Linguistics o Language Technology QA o Lexicons and Word Lists o Linguistic Consulting o Linguistic Rule Development o Personalization + o Search Relevance o Secure Services o Transcription o Secure o Speech Data o Translation and Localization + [solutuons_menu_image.png] Find flexible jobs Solutions Machine Translation [Press Release] Appen Launches New China Website ____________________ (BUTTON) Blog Events Investors Resources Contact Us [linkedin.png] [fb.png] [twitter.png] [youtube.png] EN [english.png] [english.png] English [chinese.png] 中文 Machine Translation Drive higher customer satisfaction with automatic translation capabilities that are highly accurate Build more robust automatic translation systems with high-quality data [Solutions_header-Machine-Translation.png] How we Systems that rely on machine translation require high quality speech and text data to produce accurate results. Often it is challenging to find the resources needed to supply your system with enough quality data in all of your target markets. Working with an experienced partner can greatly accelerate your time to market and can result in a system that builds stronger customer satisfaction. [Machine_Translation.png] Our approach Appen can you customize your machine translation engines with a range of services from domain-specific training and test sets, post-editing, machine translation evaluation and linguistic services. Our skilled project managers work with your team to understand your objectives and timeline, and will customize a program to meet your needs [Appen-Solutions-why_appen_circle.png] Why Appen? For over 20 years, Appen has worked with companies around the world to improve their speech and machine learning-based solutions by providing-high quality, human-annotated data. With coverage for over 180 languages and dialects, we can you reach more customers around the globe. Our data services __________________________________________________________________ Consultative Services We work closely with your team to develop a customized program that addresses your unique business challenges. Language Technology QA Develop top notch language-based solutions with language quality assurance services. Lexicons and Word Lists Use custom lexicons and word lists to ensure the accuracy of your speech and text-based systems. Linguistic Consulting Ensure your solutions meet the needs of customers worldwide with the of expert linguists. Speech Data Collection Use our curated global crowd to collect high quality speech data in over 180 languages and dialects. Text Data Collection Collect millions of high quality data samples to ensure your solution meets the needs of your customers worldwide. Translation and Localization Traditional translation and machine translation services from language and data experts. Additional resources __________________________________________________________________ Appen Recognized Among Largest Language Service Providers in the World Appen Recognized Among Largest Language Service Providers in the WorldAppen Ranked One of the Largest LSPs Insights from Conversational Interaction 2018: NLP, Chatbots & Comedians Insights from Conversational Interaction 2018: NLP, Chatbots, and ComediansConversational Interaction 2018 AI Requires a Human Touch_Appen Crowdsourcing_Crowd Sourced Data AI Requires a Human Touch: How Appen Recruits Crowds to Improve TechnologyHow Appen Recruits to Support AI Appen Off the Shelf Linguistic Resources Quickly expand your products into new markets with licensed language data. Gain immediate access to a complete speech and language database to accelerate your product development efforts. Learn more Improve your training data Scale your machine learning with high-quality training data. Talk with one of our experts today. __________________________________________________________________ Contact us [logo.png] Appen is a global leader in the development of high-quality, human-annotated datasets for machine learning and artificial intelligence. [linkedin.png] [fb.png] [twitter.png] [youtube.png] Australia Corporate Headquarters Level 6, 9 Street Chatswood NSW 2067 +61-2-9468-6300 China Office 712, 7/F Metropolis Tower, No.2 Haidian Dongsan Street Zhongguancun Xi Zone, Haidian District Beijing, China +86-181-4650-3673 Philippines BPO building 1 Suntech iPark Lancaster New City, Barangay Alapan II Imus Cavite, Philippines 3/F Metro Lifestyle Complex F. Torres St & E. Jacinto Ext Davao City, Philippines 8000 United Kingdom Rockeagle House, Pynes Hill Exeter, EX2 5AZ, Devon +44-1392-213-958 Visit Appen UK United States Toll free: + 1-866-673-6996 From outside the US: + 1-646-224-1146 Seattle US Headquarters 12131 113th Ave NE Suite 100 Kirkland, WA 98034-6944 Detroit 27280 Haggerty Rd. Suite C-8 Farmington Hills, MI 48331 San Francisco 999 5th Ave. Suite 570 San Rafael, California, 94901 Pleasanton 5050 Hopyard Rd. Suite 425 Pleasanton, California 94588 Copyright 2018. Appen Limited Privacy Statement | Data Subject Access Request IFRAME: https://www.googletagmanager.com/ns.html?idGTM-WL36V8J A history of machine translation from the Cold War to deep learning Go to the profile of Ilya Pestov Ilya Pestov BlockedUnblock FollowFollowing Mar 12, 2018 [1bA4rvF1PhR3cq5vkVYMxow.jpeg] Photo by Ant Rozetsky on Unsplash I open Google Translate twice as often as , and the instant translation of the price tags is not a cyberpunk for me anymore. That’s what we call reality. It’s hard to imagine that this is the result of a centennial fight to build the algorithms of machine translation and that there has been no visible success during half of that period. The precise developments I’ll discuss in this article set the basis of all modern language processing systems — from search engines to voice-controlled microwaves. I’m talking about the evolution and structure of online translation today. [1SvP9QNT2zekXzfkiInIwbA.png] The translating machine of P. P. Troyanskii (Illustration made from descriptions. No photos left, unfortunately.) In the beginning The story begins in 1933. Soviet scientist Peter Troyanskii presented “the machine for the selection and printing of words when translating from one language to another” to the Academy of Sciences of the USSR. The invention was super simple — it had cards in four different languages, a typewriter, and an old-school film camera. The operator took the first word from the text, found a corresponding card, took a photo, and typed its morphological characteristics (noun, plural, genitive) on the typewriter. The typewriter’s keys encoded one of the features. The tape and the camera’s film were used simultaneously, making a set of frames with words and their morphology. [1BlAQDuIH7SkxbP4U_cjWcA.png] Despite all this, as often happened in the USSR, the invention was considered “useless”. Troyanskii died of Stenocardia after trying to finish his invention for 20 years. No one in the world knew about the machine until two Soviet scientists found his patents in 1956. It was at the beginning of the Cold War. On January 7th 1954, at IBM headquarters in New York, the Georgetown–IBM experiment started. The IBM 701 computer automatically translated 60 Russian sentences into English for the first time in history. “A girl who didn’t understand a word of the language of the Soviets punched out the Russian messages on IBM cards. The “brain” dashed off its English translations on an automatic printer at the breakneck speed of two and a half lines per second,” — reported the IBM press release. [1DJrNvaXxrUNXqH-tglpm6w.jpeg] IBM 701 However, the triumphant headlines hid one little detail. No one mentioned the translated examples were carefully selected and tested to exclude any ambiguity. For everyday use, that system was no better than a pocket phrasebook. Nevertheless, this sort of arms race launched: Canada, Germany, France, and especially Japan, all joined the race for machine translation. The race for machine translation The vain struggles to improve machine translation lasted for forty years. In 1966, the US ALPAC committee, in its famous report, called machine translation expensive, inaccurate, and unpromising. They instead recommended focusing on dictionary development, which eliminated US researchers from the race for almost a decade. Even so, a basis for modern Natural Language Processing was created only by the scientists and their attempts, research, and developments. All of today’s search engines, spam filters, and personal assistants appeared thanks to a bunch of countries spying on each other. [1d-iF6wcVYCWFDLkghpJvkw.png] Rule-based machine translation (RBMT) The first ideas surrounding rule-based machine translation appeared in the 70s. The scientists peered over the interpreters’ work, trying to compel the tremendously sluggish computers to repeat those actions. These systems consisted of: Bilingual dictionary (RU -> EN) A set of linguistic rules for each language (For example, nouns ending in certain suffixes such as -heit, -keit, -ung are feminine) That’s it. If needed, systems could be supplemented with hacks, such as lists of names, spelling correctors, and transliterators. [1_xwoE70TZYYstf4P8DWbjg.png] PROMPT and Systran are the most famous examples of RBMT systems. Just take a look at the Aliexpress to feel the soft breath of this golden age. But even they had some nuances and subspecies. Direct Machine Translation This is the most straightforward type of machine translation. It divides the text into words, translates them, slightly corrects the morphology, and harmonizes syntax to make the whole thing sound right, more or less. When the sun goes down, trained linguists write the rules for each word. The output returns some kind of translation. Usually, it’s quite crappy. It seems that the linguists wasted their time for nothing. Modern systems do not use this approach at all, and modern linguists are grateful. [15ma5py_YcXd9n9GW1QYXhw.png] Transfer-based Machine Translation In contrast to direct translation, we prepare first by determining the grammatical structure of the sentence, as we were taught at school. Then we manipulate whole constructions, not words, afterwards. This s to get quite decent conversion of the word order in translation. In theory. In practice, it still resulted in verbatim translation and exhausted linguists. On the one hand, it brought simplified general grammar rules. But on the other, it became more complicated because of the increased number of word constructions in comparison with single words. [0bDUN2vDxvzQuwbXi.png] Interlingual Machine Translation In this method, the source text is transformed to the intermediate representation, and is unified for all the world’s languages (interlingua). It’s the same interlingua Descartes dreamed of: a meta-language, which follows the universal rules and transforms the translation into a simple “back and forth” task. Next, interlingua would convert to any target language, and here was the singularity! Because of the conversion, Interlingua is often confused with transfer-based systems. The difference is the linguistic rules specific to every single language and interlingua, and not the language pairs. This means, we can add a third language to the interlingua system and translate between all three. We can’t do this in transfer-based systems. [04tMnA3BNQugt1D3O.png] It looks perfect, but in real life it’s not. It was extremely hard to create such universal interlingua — a lot of scientists have worked on it their whole lives. They’ve not succeeded, but thanks to them we now have morphological, syntactic, and even semantic levels of representation. But the only Meaning-text theory costs a fortune! The idea of intermediate language will be back. Let’s wait awhile. [0XeULtDZzJF9ajRH9.png] As you can see, all RBMT are dumb and terrifying, and that’s the reason they are rarely used unless for specific cases (like the weather report translation, and so on). Among the advantages of RBMT, often mentioned are its morphological accuracy (it doesn’t confuse the words), reproducibility of results (all translators get the same result), and the ability to tune it to the subject area (to teach economists or terms specific to programmers, for example). Even if anyone were to succeed in creating an ideal RBMT, and linguists enhanced it with all the spelling rules, there would always be some exceptions: all the irregular verbs in English, separable prefixes in German, suffixes in Russian, and situations when people just say it differently. Any attempt to take into account all the nuances would waste millions of man hours. And don’t forget about homonyms. The same word can have a different meaning in a different context, which leads to a variety of translations. How many meanings can you catch here: I saw a man on a hill with a telescope? Languages did not develop based on a fixed set of rules — a fact which linguists love. They were much more influenced by the history of invasions in past three hundred years. How could you explain that to a machine? Forty years of the Cold War didn’t in finding any distinct solution. RBMT was dead. Example-based Machine Translation (EBMT) Japan was especially interested in fighting for machine translation. There was no Cold War, but there were reasons: very few people in the country knew English. It promised to be quite an issue at the upcoming globalization party. So the Japanese were extremely motivated to find a working method of machine translation. Rule-based English-Japanese translation is extremely complicated. The language structure is completely different, and almost all words have to be rearranged and new ones added. In 1984, Makoto Nagao from Kyoto University came up with the idea of using ready-made phrases instead of repeated translation. Let’s imagine that we have to translate a simple sentence — “I’m going to the cinema.” And let’s say we’ve already translated another similar sentence — “I’m going to the theater” — and we can find the word “cinema” in the dictionary. All we need is to figure out the difference between the two sentences, translate the missing word, and then not screw it up. The more examples we have, the better the translation. I build phrases in unfamiliar languages exactly the same way! [0GC2LYl8IG_Tbrqjp.png] EBMT showed the light of day to scientists from all over the world: it turns out, you can just feed the machine with existing translations and not spend years forming rules and exceptions. Not a revolution yet, but clearly the first step towards it. The revolutionary invention of statistical translation would happen in just five years. Statistical Machine Translation (SMT) In early 1990, at the IBM Research Center, a machine translation system was first shown which knew nothing about rules and linguistics as a whole. It analyzed similar texts in two languages and tried to understand the patterns. [0FvoRKJ59wNWiMGGL.png] The idea was simple yet beautiful. An identical sentence in two languages split into words, which were matched afterwards. This operation repeated about 500 million times to count, for example, how many times the word “Das Haus” translated as “house” vs “building” vs “construction”, and so on. If most of the time the source word was translated as “house”, the machine used this. Note that we did not set any rules nor use any dictionaries — all conclusions were done by machine, guided by stats and the logic that “if people translate that way, so will I.” And so statistical translation was born. [02YCWdtS_fRU1FiwU.png] The method was much more efficient and accurate than all the previous ones. And no linguists were needed. The more texts we used, the better translation we got. [0kh8NtmuiylNGAU9W.png] Google’s statistical translation from the inside. It shows not only the probabilities but also counts the reverse statistics. There was still one question left: how would the machine correlate the word “Das Haus,” and the word “building” — and how would we know these were the right translations? The answer was that we wouldn’t know. At the start, the machine assumed that the word “Das Haus” equally correlated with any word from the translated sentence. Next, when “Das Haus” appeared in other sentences, the number of correlations with the “house” would increase. That’s the “word alignment algorithm,” a typical task for university-level machine learning. The machine needed millions and millions of sentences in two languages to collect the relevant statistics for each word. How did we get them? Well, we decided to take the abstracts of the European Parliament and the United Nations Security Council meetings — they were available in the languages of all member countries and were now available for download atCorpora and Europarl Corpora. Word-based SMT In the beginning, the first statistical translation systems worked by splitting the sentence into words, since this approach was straightforward and logical. IBM’s first statistical translation model was called Model one. Quite elegant, right? Guess what they called the second one? Model 1: “the bag of words” [0Wx7m2xjZwLKg8kMC.png] Model one used a classical approach — to split into words and count stats. The word order wasn’t taken into account. The only trick was translating one word into multiple words. For example, “Der Staubsauger” could turn into “Vacuum Cleaner,” but that didn’t mean it would turn out vice versa. Here’re some simple implementations in Python: shawa/IBM-Model-1. Model 2: considering the word order in sentences [0X-XpACu2OIwK1Ib3.png] The lack of knowledge about languages’ word order became a problem for Model 1, and it’s very important in some cases. Model 2 dealt with that: it memorized the usual place the word takes at the output sentence and shuffled the words for the more natural sound at the intermediate step. Things got better, but they were still kind of crappy. Model 3: extra fertility [0CSPtjUQ-hkkpVeyG.png] New words appeared in the translation quite often, such as articles in German or using “do” when negating in English. “Ich will keine Persimonen” → “I do not want Persimmons.” To deal with it, two more steps were added to Model 3. The NULL token insertion, if the machine considers the necessity of a new word Choosing the right grammatical particle or word for each token-word alignment Model 4: word alignment Model 2 considered the word alignment, but knew nothing about the reordering. For example, adjectives would often switch places with the noun, and no matter how good the order was memorized, it wouldn’t make the output better. Therefore, Model 4 took into account the so-called “relative order” — the model learned if two words always switched places. Model 5: bugfixes Nothing new here. Model 5 got some more parameters for the learning and fixed the issue with conflicting word positions. Despite their revolutionary nature, word-based systems still failed to deal with cases, gender, and homonymy. Every single word was translated in a single-true way, according to the machine. Such systems are not used anymore, as they’ve been replaced by the more advanced phrase-based methods. Phrase-based SMT This method is based on all the word-based translation principles: statistics, reordering, and lexical hacks. Although, for the learning, it split the text not only into words but also phrases. These were the n-grams, to be precise, which were a contiguous sequence of n words in a row. Thus, the machine learned to translate steady combinations of words, which noticeably improved accuracy. [0Sk18CDwMZM8oyV4R.png] The trick was, the phrases were not always simple syntax constructions, and the quality of the translation dropped significantly if anyone who was aware of linguistics and the sentences’ structure interfered. Frederick Jelinek, the pioneer of the computer linguistics, joked about it once: “Every time I fire a linguist, the performance of the speech recognizer goes up.” Besides improving accuracy, the phrase-based translation provided more options in choosing the bilingual texts for learning. For the word-based translation, the exact match of the sources was critical, which excluded any literary or free translation. The phrase-based translation had no problem learning from them. To improve the translation, researchers even started to parse the news websites in different languages for that purpose. [0WlDBSrqS9s1kk630.png] Starting in 2006, everyone began to use this approach. Google Translate, Yandex, Bing, and other high-profile online translators worked as phrase-based right up until 2016. Each of you can probably recall the moments when Google either translated the sentence flawlessly or resulted in complete nonsense, right? The nonsense came from phrase-based features. The good old rule-based approach consistently provided a predictable though terrible result. The statistical methods were surprising and puzzling. Google Translate turns “three hundred” into “300” without any hesitation. That’s called a statistical anomaly. Phrase-based translation has become so popular, that when you hear “statistical machine translation” that is what is actually meant. Up until 2016, all studies lauded phrase-based translation as the state-of-the-art. Back then, no one even thought that Google was already stoking its fires, getting ready to change our whole image of machine translation. Syntax-based SMT This method should also be mentioned, briefly. Many years before the emergence of neural networks, syntax-based translation was considered “the future or translation,” but the idea did not take off. The proponents of syntax-based translation believed it was possible to merge it with the rule-based method. It’s necessary to do quite a precise syntax analysis of the sentence — to determine the subject, the predicate, and other parts of the sentence, and then to build a sentence tree. Using it, the machine learns to convert syntactic units between languages and translates the rest by words or phrases. That would have solved the word alignment issue once and for all. [0M65BEFOBrHhm6iOz.png] Example taken from the Yamada and Knight [2001] and this great slide show. The problem is, the syntactic parsing works terribly, despite the fact that we consider it solved a while ago (as we have the ready-made libraries for many languages). I tried to use syntactic trees for tasks a bit more complicated than to parse the subject and the predicate. And every single time I gave up and used another method. Let me know in the comments if you succeed using it at least once. Neural Machine Translation (NMT) A quite amusing paper on using neural networks in machine translation was published in 2014. The Internet didn’t notice it at all, except Google — they took out their shovels and started to dig. Two years later, in November 2016, Google made a game-changing announcement. The idea was close to transferring the style between photos. Remember apps like Prisma, which enhanced pictures in some famous artist’s style? There was no magic. The neural network was taught to recognize the artist’s paintings. Next, the last layers containing the network’s decision were removed. The resulting stylized picture was just the intermediate image that network got. That’s the network’s fantasy, and we consider it beautiful. [0SJ5jkim-JzqtCZmC.jpg] If we can transfer the style to the photo, what if we try to impose another language to a source text? The text would be that precise “artist’s style,” and we would try to transfer it while keeping the essence of the image (in other words, the essence of the text). Imagine I’m trying to describe my dog — average size, sharp nose, short tail, always barks. If I gave you this set of the dog’s features, and if the description was precise, you could draw it, even though you have never seen it. [0NBrI8ZZkSUoYYl0D.png] Now, imagine the source text is the set of specific features. Basically, it means that you encode it, and let the other neural network decode it back to the text, but, in another language. The decoder only knows its language. It has no idea about of the features’ origin, but it can express them in, for example, Spanish. Continuing the analogy, it doesn’t matter how you draw the dog — with crayons, watercolor or your finger. You paint it as you can. Once again — one neural network can only encode the sentence to the specific set of features, and another one can only decode them back to the text. Both have no idea about the each other, and each of them knows only its own language. Recall something? Interlingua is back. Ta-da. [0iK-SDu3fQhnV6y5e.png] The question is, how do we find those features? It’s obvious when we’re talking about the dog, but how to deal with the text? Thirty years ago scientists already tried to create the universal language code, and it ended in a total failure. Nevertheless, we have deep learning now. And that’s its essential task! The primary distinction between the deep learning and classic neural networks lays precisely in the ability to search for those specific features, without any idea of their nature. If the neural network is big enough, and there are a couple of thousand video cards at hand, it’s possible to find those features in the text as well. Theoretically, we can pass the features gotten from the neural networks to the linguists, so that they can open brave new horizons for themselves. The question is, what type of neural network should be used for encoding and decoding? Convolutional Neural Networks (CNN) fit perfectly for pictures since they operate with independent blocks of pixels. But there are no independent blocks in the text — every word depends on its surroundings. Text, speech, and music are always consistent. So recurrent neural networks (RNN) would be the best choice to handle them, since they remember the previous result — the prior word, in our case. Now RNNs are used everywhere — Siri’s speech recognition (it’s parsing the sequence of sounds, where the next depends on the previous), keyboard’s tips (memorize the prior, guess the next), music generation, and even chatbots. [0UOAWQr_t-7HDS9iF.png] For the nerds like me: in fact, the neural translators’ architecture varies widely. The regular RNN was used at the beginning, then upgraded to bi-directional, where the translator considered not only words before the source word, but also the next word. That was much more effective. Then it followed with the hardcore multilayer RNN with LSTM-units for long-term storing of the translation context. In two years, neural networks surpassed everything that had appeared in the past 20 years of translation. Neural translation contains 50% fewer word order mistakes, 17% fewer lexical mistakes, and 19% fewer grammar mistakes. The neural networks even learned to harmonize gender and case in different languages. And no one taught them to do so. The most noticeable improvements occurred in fields where direct translation was never used. Statistical machine translation methods always worked using English as the key source. Thus, if you translated from Russian to German, the machine first translated the text to English and then from English to German, which leads to a double loss. Neural translation doesn’t need that — only a decoder is required so it can work. That was the first time that direct translation between languages with no сommon dictionary became possible. [0ta7lURpkKgMvRkUr.jpg] Google Translate (since 2016) In 2016, Google turned on neural translation for nine languages. They developed their system named Google Neural Machine Translation (GNMT). It consists of 8 encoder and 8 decoder layers of RNNs, as well as attention connections from the decoder network. [0j6xn1YgmI6jaMagy.png] They not only divided sentences, but also words. That was how they dealt with one of the major NMT issues — rare words. NMTs are less when the word is not in their lexicon. Let’s say, “Vas3k”. I doubt anyone taught the neural network to translate my nickname. In that case, GMNT tries to break words into word pieces and recover the translation of them. Smart. Hint: Google Translate used for website translation in the browser still uses the old phrase-based algorithm. Somehow, Google hasn’t upgraded it, and the differences are quite noticeable compared to the online version. Google uses a crowdsourcing mechanism in the online version. People can choose the version they consider the most correct, and if lots of users like it, Google will always translate this phrase that way and mark it with a special badge. This works fantastically for short everyday phrases such as, “Let’s go to the cinema,” or, “I’m waiting for you.” Google knows conversational English better than I do :( Microsoft’s Bing works exactly like Google Translate. But Yandex is different. Yandex Translate (since 2017) Yandex launched its neural translation system in 2017. Its main feature, as declared, was hybridity. Yandex combines neural and statistical approaches to translate the sentence, and then it choose the best one with its favorite CatBoost algorithm. The thing is, neural translation often fails when translating short phrases, since it uses context to choose the right word. It would be hard if the word appeared very few times in a training data. In such cases, a simple statistical translation finds the right word quickly and simply. [0JAh8EbHC6Sk9nIVU.png] Yandex doesn’t share the details. It fends us off with marketing press-releases. OKAY. It looks like Google uses SMT for the translation of words and short phrases. They don’t mention that in any articles, but it’s quite noticeable if you look at the difference between the translation of short and long expressions. Besides, SMT is used for displaying the word’s stats. The conclusion and the future Everyone’s still excited about the idea of “Babel fish” — instant speech translation. Google has made steps towards it with its Pixel Buds, but in fact, it’s still not what we were dreaming of. The instant speech translation is different from the usual translation. You need to know when to start translating and when to shut up and listen. I haven’t seen suitable approaches to solve this yet. Unless, maybe, Skype… And here’s one more empty area: all the learning is limited to the set of parallel text blocks. The deepest neural networks still learn at parallel texts. We can’t teach the neural network without providing it with a source. People, instead, can complement their lexicon with reading books or articles, even if not translating them to their native language. If people can do it, the neural network can do it too, in theory. I found only one prototype attempting to incite the network, which knows one language, to read the texts in another language in order to gain experience. I’d try it myself, but I’m silly. Ok, that’s it. This story originally was written in Russian and then translated into English on Vas3k.com by Vasily Zubarev. He is my pen-friend and I’m pretty sure that his blog should be spread. Useful links Philipp Koehn: Statistical Machine Translation. Most complete collection of the methods I’ve found. Moses — popular library for creating own statistical translations OpenNMT — one more library, but for the neural translators The article from one of my favorite bloggers explaining RNN and LSTM A video “How to Make a Language Translator”, funny guy, neat explanation. Still not enough. Text guide from TensorFlow about creation of your own neural translator, for those who want more examples and to try the code. __________________________________________________________________ Others articles from Vas3k.com How Ethereum and Smart Contracts Work Distributed Turing Machine with Blockсhain Protectionvas3k.com Blockchain Inside Out: How Bitcoin Works Once and for all in simple wordsvas3k.com One last thing… If you liked this article, click the👏 below, and share it with other people so they can enjoy it as well. Machine Learning Tech Technology Programming Artificial Intelligence 1.96K claps 5 BlockedUnblock FollowFollowing Go to the profile of Ilya Pestov Ilya Pestov Startup hunter, analyst, bot evangelist. Ex-CMO at Statsbot. Follow freeCodeCamp.org freeCodeCamp.org Stories worth reading about programming and technology from our open source community. alternate IFRAME: https://www.googletagmanager.com/ns.html?idGTM-TQH3BX Logo ____________________ English (US) 日本語 a request Sign in ____________________ 1. Memsource 2. Managing Translations 3. Memsource Cloud User Manual Manage Machine Translation via Memsource In This Article Memsource users can now purchase machine translation characters and track machine translation character usage directly in Memsource. Purchasing Machine Translation Characters Currently, it is only possible to purchase characters for Microsoft Translator, Microsoft Translator Hub, and Microsoft Custom Translator. When you manage Microsoft Translator, Microsoft Translator Hub or Microsoft Custom Translator in Memsource, you receive 2 million free characters per month. If you select Microsoft Translator (+free characters), Microsoft Translator Hub (+free characters), or Microsoft Custom Translator (+free characters) from the list of supported MT engines, you will automatically create an MT engine that is managed via Memsource. This means you can purchase MT characters for this engine in Memsource without creating an account in Microsoft. If you have an existing Microsoft Translator or Microsoft Translator Hub account that you want to manage via Memsource, complete the following steps: 1) On the Machine Translation Settings page, select the MT engine and click Edit. 2) Select the Get free characters check box 3) Click Save. To opt out of managing a MT engine via Memsource, see the main Machine Translation article. To buy more characters, select Buy Characters next to the appropriate MT engine on the Machine Translation Settings page. There are three bundles available: 2 million characters - $20 (€17) 5 million characters - $50 (€43) 10 million characters - $100 (€86) Select a bundle and then follow the instructions on the payment pages. Once you have bought the characters, you will see the that the characters have been added to the Remaining characters column on the Machine Translation Settings page. An invoice will have been automatically generated. A link to the invoice will be available in the green banner that appears when the payment has been successful. It can also be viewed by going to Setup>Subscription>Details. The invoice will be called Machine Translation. There is no time limit when it comes to using up these characters. Once they are purchased they will remain in your account. How Free and Paid Characters Are Used When using Microsoft Translator (+free characters), Microsoft Translator Hub (+free characters), or Microsoft Custom Translator (+free characters) via Memsource, your balance of free characters will be topped up to 2 million every month. Unused free characters are not carried over to the next month. If you purchase characters on top of your free characters, free characters are always consumed first. Example: You set up an MT engine and receive 2 million free characters. Then, you buy another 5 million characters. During the month you only consume 1.5 million characters. This means that next month we will give you another 1.5 million free characters and none of the 5 million characters you purchased will be consumed. Monitoring Character Usage On the Machine Translation Settings page you will see a usage chart for the different MT engines in your account. Currently the chart can only display data from the past 30 days. You can view data for a specific engine by deselecting the names of the other engines at the bottom of the chart. Please note: By default, Project Managers will only be able to see data related to projects they have created. For PMs to see all data for all projects in an organization, an Admin user for the organization will need to adjust the User settings by going to Setup>User, finding the user and selecting View all data next to the option Home page Dashboards. [mceclip0.png] Also on the Machine Translation Settings page, you will see the usual list of engines associated with the account, but there is one extra column: Remaining Characters. With the supported MT engines, Microsoft Translator and Microsoft Translator Hub, you will see the number of remaining characters available for each of the engines. As you use the characters, this number will decrease. For other MT engines, the remaining characters will be unknown. [mceclip1.png] Related articles Machine Translation Microsoft with Feedback Deprecation How To Get Google Translate API Key And Start Using Google NMT in Memsource Search and edit Translation Memory content Continuous Job Need a request New Community Post Resources Free Trial Webinars New Features Release Notes System Status Memsource English (US) 日本語 MT News AMTA 2018 | Proceedings for the Conference, Keynotes, Workshops and Tutorials March 21, 2018 Main Conference Research Track Download (2.7 MB) Commercial and Government Tracks Download (28.4 MB) Keynotes Arianna Bisazza - Leiden Read more AMTA 2018 | Workshop | The Role of Authoritative Standards in the MT Environment January 30, 2018 In this workshop, we will bring together experts from across the standards community, including from the American Society for Testing Read more AMTA 2018 | Tutorial | ModernMT: Open-Source Adaptive Neural MT for Enterprises and Translators January 30, 2018 Nowadays, computer-assisted translation (CAT) tools represent the dominant technology in the translation market – and those including machine translation (MT) Read more AMTA 2018 | Tutorial | MQM-DQF: A Good Marriage (Translation Quality for the 21st Century) January 30, 2018 In the past three years, the language industry has been converging on the use of the MQM-DQF framework for analytic Read more AMTA 2018 | Tutorial | A Deep Learning curve for Post-Editing January 30, 2018 Does post-editing also require a deep learning curve? How do the neural networks of post-editors work in concert with neural Read more AMTA 2018 | Tutorial | De-mystifying Neural MT January 30, 2018 Neural Machine Translation technology is progressing at a very rapid pace. In the last few years, the research community has Read more AMTA 2018 | Tutorial | Getting Started Customizing MT with Microsoft Translator Hub: From Pilot Project to Production January 30, 2018 Develop an Effective MT Customization Pilot Project Learn strategies to plan and carry out an effective pilot project to train Read more AMTA 2018 | Tutorial | Corpora Quality Management for MT – Practices and Roles January 17, 2018 Tutorial Presenters: Nicola Ueffing (eBay MT Science), Pete Smith (University of Texas Arlington) and Silvio Picinini (eBay Localization) Target audience: Read more AMTA 2018 | Workshop | Translation Quality Estimation and Automatic Post-Editing January 2, 2018 Boston, Massachusetts, March 21, 2018 The goal of quality estimation is to evaluate a translation system’s quality without access to Read more ResearchersWhere to publish MT related research? Here is some of most prestigious international conferences and scientific journals that publish research papers related to machine translation. TAUS Guidelines on Post-Editing TAUS Post-Editing Guidelines (created in partnership with CNGL): general post-editing guidelines for "good enough" and "human translation level" post-editing pricing Read more DevelopersSlate from Precision Translation Tools Precision Translation Tools announces the release of Slate, the first packaged SMT toolkit for native Windows x86-64 operating systems. Note: Read more MT for Translators During the last couple of years, machine translation post-editing has become one of the hottest most discussed topics in the translation industry as evidenced by conferences, forums and webinars. MT as part of a translation service Machine translation as a service can be either a byproduct for some teams and companies that develop MT technology for above mentioned use cases, or they focus on MT technology development Government MT Users Features of machine translation (MT) implementations and project efforts in official settings, regardless of jurisdiction, are guided by at least three attributes common to administration of authority. NEW IN MACHINE TRANSLATION? BOOKSHELF BECOME A MEMBER! Posted in: UsersMT for Translators During the last couple of years, machine translation post-editing has become one of the hottest most discussed topics in the Read more MT as part of a translation service Machine translation as a service can be either a byproduct for some teams and companies that develop MT technology for Read more Government MT Users Features of machine translation (MT) implementations and project efforts in official settings, regardless of jurisdiction, are guided by at least Read more NEW IN MACHINE TRANSLATION? BOOKSHELF BECOME A MEMBER! Quick Links Past Conferences Industrial Research Labs Research in Academia Research in Government Home Machine Translation NIPS Proceedings^β ____________________ Books 2016 Conference Event Type: Poster Abstract While neural machine translation (NMT) is making good progress in the past two years, tens of millions of bilingual sentence pairs are needed for its training. However, human labeling is very costly. To tackle this training data bottleneck, we develop a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled data through a dual-learning game. This mechanism is inspired by the following observation: any machine translation task has a dual task, e.g., English-to-French translation (primal) versus French-to-English translation (dual); the primal and dual tasks can form a closed loop, and generate informative feedback signals to train the translation models, even if without the involvement of a human labeler. In the dual-learning mechanism, we use one agent to represent the model for the primal task and the other agent to represent the model for the dual task, then ask them to teach each other through a reinforcement learning process. Based on the feedback signals generated during this process (e.g., the language-model likelihood of the output of a model, and the reconstruction error of the original sentence after the primal and dual translations), we can iteratively update the two models until convergence (e.g., using the policy gradient methods). We call the corresponding approach to neural machine translation \emph{dual-NMT}. Experiments show that dual-NMT works very well on English$\leftrightarrow$French translation; especially, by learning from monolingual data (with 10\% bilingual data for warm start), it achieves a comparable accuracy to NMT trained from the full bilingual data for the French-to-English translation task. Neural Information Processing Systems (NIPS) Papers published at the Neural Information Processing Systems Conference. 1987 – 2019 Neural Information Processing Systems Foundation, Inc. alternate alternate alternate alternate alternate alternate alternate Unbabel » Feed IFRAME: https://www.googletagmanager.com/ns.html?idGTM-M77VLBR (BUTTON) Solutions Customer Service Increase customer satisfaction, cut down response times, and build a more efficient operation. Unbabel for Zendesk Get multilingual with Zendesk Support, Chat and Guide Unbabel for Freshdesk Deliver customer support in 28 languages on Freshdesk Unbabel for Salesforce Seamless translation solutions for Service Cloud, Knowledge, and Live Agent Unbabel for Video Your one-stop shop for high quality transcription, translation and subtitling APIs Translation API Video API Pricing Blog Request a demo Become a translator The world’s only human-quality translation pipeline Request demo API documentation AI + Human Translation API Get human-quality translations of your content piped where you need it Continuous translation Unbabel can translate all your content seamlessly 50,000+ editor community AI assisted Human Translators around the globe translate your content Final human touch Human quality is enforced by skilled professionals before delivery AI, Glossaries and Style Guides AI assisted guidance ensures translation quality and speed in every step [unbabel-apipage-usp-01.svg] State-of-the-art Translation Unbabel employs custom Machine Translation engines using state-of-the-art Neural Machine Translation (NMT) adapted to our customers’ domains. [unbabel-apipage-usp-02.svg] 50,000+ editor community We work with a Community of professional translators and native speakers. They’re on the move, around the globe, working on the Unbabel Platform on their computers and mobile phones. [unbabel-apipage-usp-03.svg] Glossaries & style guides Customer glossaries and style guides assure quality and consistency with your brand’s voice in every translation. [unbabel-apipage-usp-04.svg] The world’s best Quality Estimation system Unbabel has the world’s most state of the art Quality Estimation system, winning multiple shared tasks at the Workshop on Machine Translation by wide margins. We use it to rank our translations and to identify incorrect words for our editors to pay special attention to. [unbabel-apipage-usp-05.svg] Better than out-of-the-box Google, Microsoft and Yandex We incorporate customer-specific training data, machine translation engines adapted by content type, and a host of machine learning algorithms to beat out-of-the-box MT solutions from some of the biggest names in tech. [unbabel-apipage-usp-06.svg] Developers are welcome With a fully functional SDK for Python, and SDKs for Ruby and PHP in development, you can put Unbabel to work for you as quickly as possible. Learn more. See it in action IFRAME: https://www.youtube.com/embed/zGQOLW9KJxo?rel0&showinfo0 Use Cases The Unbabel Translation API seamlessly integrates with your workflows, business processes, websites, apps, comms platforms and more. Your platform, multilingual Build multilingual platform integrations, like we’ve done for Salesforce, Zendesk and more. CMS translation on the go Translate within your CMS so you can easily publish new content in multiple languages. Build and partner with Unbabel Develop an Unbabel Integration with another platform and list it on our Marketplace. Reach customers in their native language with Unbabel Request demoAPI documentation Company About Publications Careers Press & Media Portugal 2020 – Project 10432 Portugal 2020 – Project 027767 Why Unbabel Translation Quality Translation Speed Languages Customers Translators Developers Use Cases Multilingual Customer Service Multilingual Live Chat Support Customer Support Translator Support API Documentation Terms of Service Privacy Policy Contact English + Español + Português + Italiano + Deutsch + Français + 简体中文 Unbabel Building universal understanding publisher Medium alternate Homepage Homepage Towards Data Science Follow Sign inGet started Home Data Science Machine Learning Programming Visualization AI Picks Contribute Home Data Science Machine Learning Programming Visualization AI Picks Contribute __________________________________________________________________ Neural Machine Translation with Python Go to the profile of Susan Li Susan Li BlockedUnblock FollowFollowing Jun 23, 2018 [1MfgBaSPc2G4hExYIOqRwOQ.png] Photo credit: eLearning Industry Machine translation, sometimes referred to by the abbreviation MT is a very challenge task that investigates the use of software to translate text or speech from one language to another. Traditionally, it involves large statistical models developed using highly sophisticated linguistic knowledge. Here we are, we are going to use deep neural networks for the problem of machine translation. We will discover how to develop a neural machine translation model for translating English to French. Our model will accept English text as input and return the French translation. To be more precise, we will be practicing building 4 models, which are: A simple RNN. An RNN with embedding. A bidirectional RNN. An encoder-decoder model. Training and evaluating deep neural networks is a computationally intensive task. I used AWS EC2 instance to run all of the code. If you plan to follow along, you should have access to GPU instances. Import the libraries import collections import er import numpy as np import project_tests as tests from keras.preprocessing.text import Tokenizer from keras.preprocessing.sequence import pad_sequences from keras.models import Model from keras.layers import GRU, Input, Dense, TimeDistributed, Activation, RepeatV ector, Bidirectional from keras.layers.embeddings import Embedding from keras.optimizers import Adam from keras.losses import sparse_categorical_crossentropy I use .pyto load the data, and project_test.pyis for testing our functions. The Data The dataset contains a relative small vocabulary and can be found here. The small_vocab_en file contains English sentences and their French translations in the small_vocab_fr file. Load the data english_sentences er.load_data('data/small_vocab_en') french_sentences er.load_data('data/small_vocab_fr') print('Dataset Loaded') Dataset Loaded Sample sentences Each line in small_vocab_en contains an English sentence with the respective translation in each line of small_vocab_fr. for sample_i in range(2): print('small_vocab_en Line {}: {}'.format(sample_i + 1, english_sentences[s ample_i])) print('small_vocab_fr Line {}: {}'.format(sample_i + 1, french_sentences[sa mple_i])) [1njzJa8HihVK7MCZbhFrybg.png] Figure 1 Vocabulary The complexity of the problem is determined by the complexity of the vocabulary. A more complex vocabulary is a more complex problem. Let’s look at the complexity of the data set we’ll be working with. english_words_counter collections.Counter([word for sentence in english_senten ces for word in sentence.split]) french_words_counter collections.Counter([word for sentence in french_sentence s for word in sentence.split]) print('{} English words.'.format(len([word for sentence in english_sentences for word in sentence.split]))) print('{} unique English words.'.format(len(english_words_counter))) print('10 Most common words in the English dataset:') print('"' + '" "'.join(list(zip(english_words_counter.most_common(10)))[0]) + ' "') print print('{} French words.'.format(len([word for sentence in french_sentences for w ord in sentence.split]))) print('{} unique French words.'.format(len(french_words_counter))) print('10 Most common words in the French dataset:') print('"' + '" "'.join(list(zip(french_words_counter.most_common(10)))[0]) + '" ') [1E6bHmtNhRfIS6pFcwXRSLg.png] Figure 2 Pre-process We will convert the text into sequences of integers using the following pre-process methods: 1. Tokenize the words into ids 2. Add padding to make all the sequences the same length. Tokenize Turn each sentence into a sequence of words ids using Keras’s Tokenizer function. Use this function to tokenize english_sentences and french_sentences . The function tokenize returns tokenized input and the tokenized class. def tokenize(x): x_tk Tokenizer(char_level False) x_tk.fit_on_texts(x) return x_tk.texts_to_sequences(x), x_tk text_sentences [ 'The quick brown fox jumps over the lazy dog .', 'By Jove , my quick study of lexicography won a prize .', 'This is a short sentence .'] text_tokenized, text_tokenizer tokenize(text_sentences) print(text_tokenizer.word_index) print for sample_i, (sent, token_sent) in enumerate(zip(text_sentences, text_tokenized )): print('Sequence {} in x'.format(sample_i + 1)) print(' Input: {}'.format(sent)) print(' Output: {}'.format(token_sent)) [1y3AaeOdHcidca5C3QjCD4Q.png] Figure 3 Padding Make sure all the English sequences have the same length and all the French sequences have the same length by adding padding to the end of each sequence using Keras’s pad_sequences function. def pad(x, lengthNone): if length is None: length max([len(sentence) for sentence in x]) return pad_sequences(x, maxlen length, padding 'post') tests.test_pad(pad) Pad Tokenized output test_pad pad(text_tokenized) for sample_i, (token_sent, pad_sent) in enumerate(zip(text_tokenized, test_pad)) : print('Sequence {} in x'.format(sample_i + 1)) print(' Input: {}'.format(np.array(token_sent))) print(' Output: {}'.format(pad_sent)) [1V7jGsBA25nPHCBWwwmXv-w.png] Figure 4 Pre-process Pipeline Implement a pre-process function def preprocess(x, y): preprocess_x, x_tk tokenize(x) preprocess_y, y_tk tokenize(y) preprocess_x pad(preprocess_x) preprocess_y pad(preprocess_y) Keras's sparse_categorical_crossentropy function requires the labels to be in 3 dimensions preprocess_y preprocess_y.reshape(preprocess_y.shape, 1) return preprocess_x, preprocess_y, x_tk, y_tk preproc_english_sentences, preproc_french_sentences, english_tokenizer, french_t okenizer \ preprocess(english_sentences, french_sentences) max_english_sequence_length preproc_english_sentences.shape[1] max_french_sequence_length preproc_french_sentences.shape[1] english_vocab_size len(english_tokenizer.word_index) french_vocab_size len(french_tokenizer.word_index) print('Data Preprocessed') print("Max English sentence length:", max_english_sequence_length) print("Max French sentence length:", max_french_sequence_length) print("English vocabulary size:", english_vocab_size) print("French vocabulary size:", french_vocab_size) [1LNKPKLFtAIg8riMlPiuwYA.png] Figure 5 Models In this section, we will experiment with various neural network architectures. We will begin by training four relatively simple architectures. Model 1 is a simple RNN Model 2 is a RNN with Embedding Model 3 is a Bidirectional RNN Model 4 is an Encoder-Decoder RNN After experimenting with the four simple architectures, we will construct with a deeper model that designed to outperform all four models. Ids Back to Text The neural network will be translating the input to words ids, which isn’t the final form we want. We want the French translation. The function logits_to_textwill bridge the gab between the logits from the neural network to the French translation. We will use this function to better understand the output of the neural network. def logits_to_text(logits, tokenizer): index_to_words {id: word for word, id in tokenizer.word_index.items} index_to_words[0] return ' '.join([index_to_words[prediction] for prediction in np.argmax(logits, 1)]) print('`logits_to_text` function loaded.') `logits_to_text` function loaded. Model 1: RNN [1x1R8CyV3pTPOsjXvSOO5sg.png] Figure 6 We are creating a basic RNN model which is a good baseline for sequence data that translate English to French. def simple_model(input_shape, output_sequence_length, english_vocab_size, french _vocab_size): learning_rate 1e-3 input_seq Input(input_shape[1:]) rnn GRU(64, return_sequences True)(input_seq) logits TimeDistributed(Dense(french_vocab_size))(rnn) model Model(input_seq, Activation('softmax')(logits)) model.compile(loss sparse_categorical_crossentropy, optimizer Adam(learning_rate), metrics ['accuracy']) return model tests.test_simple_model(simple_model) tmp_x pad(preproc_english_sentences, max_french_sequence_length) tmp_x tmp_x.reshape((-1, preproc_french_sentences.shape[-2], 1)) Train the neural network simple_rnn_model simple_model( tmp_x.shape, max_french_sequence_length, english_vocab_size, french_vocab_size) simple_rnn_model.fit(tmp_x, preproc_french_sentences, batch_size1024, epochs10 , validation_split0.2) Print prediction(s) print(logits_to_text(simple_rnn_model.predict(tmp_x[:1])[0], french_tokenizer)) [15ETjq8QPsMiIkD9OPNRCYA.png] Figure 7 The basic RNN model’s validation accuracy ends at 0.6039. Model 2: Embedding [1q-grgm0n_P3blVLpVQOKbg.png] Figure 8 An embedding is a vector representation of the word that is close to similar words in n-dimensional space, where the n represents the size of the embedding vectors. We will create a RNN model using embedding. from keras.models import Sequential def embed_model(input_shape, output_sequence_length, english_vocab_size, french_ vocab_size): learning_rate 1e-3 rnn GRU(64, return_sequencesTrue, activation"tanh") embedding Embedding(french_vocab_size, 64, input_lengthinput_shape[1]) logits TimeDistributed(Dense(french_vocab_size, activation"softmax")) model Sequential em can only be used in first layer --> Keras Documentation model.add(embedding) model.add(rnn) model.add(logits) model.compile(losssparse_categorical_crossentropy, optimizerAdam(learning_rate), metrics['accuracy']) return model tests.test_embed_model(embed_model) tmp_x pad(preproc_english_sentences, max_french_sequence_length) tmp_x tmp_x.reshape((-1, preproc_french_sentences.shape[-2])) embeded_model embed_model( tmp_x.shape, max_french_sequence_length, english_vocab_size, french_vocab_size) embeded_model.fit(tmp_x, preproc_french_sentences, batch_size1024, epochs10, v alidation_split0.2) print(logits_to_text(embeded_model.predict(tmp_x[:1])[0], french_tokenizer)) [1Wscri-RL8VSyX9F0EU8C8g.png] Figure 9 The embedding model’s validation accuracy ends at 0.8401. Model 3: Bidirectional RNNs [11HqR8be4idB0AefyDSRZAQ.png] Figure 10 def bd_model(input_shape, output_sequence_length, english_vocab_size, french_voc ab_size): learning_rate 1e-3 model Sequential model.add(Bidirectional(GRU(128, return_sequences True, dropout 0.1), input_shape input_shape[1:])) model.add(TimeDistributed(Dense(french_vocab_size, activation 'softmax'))) model.compile(loss sparse_categorical_crossentropy, optimizer Adam(learning_rate), metrics ['accuracy']) return model tests.test_bd_model(bd_model) tmp_x pad(preproc_english_sentences, preproc_french_sentences.shape[1]) tmp_x tmp_x.reshape((-1, preproc_french_sentences.shape[-2], 1)) bidi_model bd_model( tmp_x.shape, preproc_french_sentences.shape[1], len(english_tokenizer.word_index)+1, len(french_tokenizer.word_index)+1) bidi_model.fit(tmp_x, preproc_french_sentences, batch_size1024, epochs20, vali dation_split0.2) Print prediction(s) print(logits_to_text(bidi_model.predict(tmp_x[:1])[0], french_tokenizer)) [1vLiPKHB3plyezkcDfCccBw.png] Figure 11 The Bidirectional RNN model’s validation accuracy ends at 0.5992. Model 4: Encoder-Decoder The encoder creates a matrix representation of the sentence. The decoder takes this matrix as input and predicts the translation as output. def encdec_model(input_shape, output_sequence_length, english_vocab_size, french _vocab_size): learning_rate 1e-3 model Sequential model.add(GRU(128, input_shape input_shape[1:], return_sequences False)) model.add(RepeatVector(output_sequence_length)) model.add(GRU(128, return_sequences True)) model.add(TimeDistributed(Dense(french_vocab_size, activation 'softmax'))) model.compile(loss sparse_categorical_crossentropy, optimizer Adam(learning_rate), metrics ['accuracy']) return model tests.test_encdec_model(encdec_model) tmp_x pad(preproc_english_sentences) tmp_x tmp_x.reshape((-1, preproc_english_sentences.shape[1], 1)) encodeco_model encdec_model( tmp_x.shape, preproc_french_sentences.shape[1], len(english_tokenizer.word_index)+1, len(french_tokenizer.word_index)+1) encodeco_model.fit(tmp_x, preproc_french_sentences, batch_size1024, epochs20, validation_split0.2) print(logits_to_text(encodeco_model.predict(tmp_x[:1])[0], french_tokenizer)) [15ea_Unpt0IdP0nt95278rw.png] Figure 12 The Encoder-decoder model’s validation accuracy ends at 0.6406. Model 5: Custom Create a model_final that incorporates embedding and a bidirectional RNN into one model. At this stage, we need to do some experiments such as changing GPU parameter to 256, changing learning rate to 0.005, training our model for more (or less than) 20 epochs etc. def model_final(input_shape, output_sequence_length, english_vocab_size, french_ vocab_size): model Sequential model.add(Embedding(input_dimenglish_vocab_size,output_dim128,input_length input_shape[1])) model.add(Bidirectional(GRU(256,return_sequencesFalse))) model.add(RepeatVector(output_sequence_length)) model.add(Bidirectional(GRU(256,return_sequencesTrue))) model.add(TimeDistributed(Dense(french_vocab_size,activation'softmax'))) learning_rate 0.005 model.compile(loss sparse_categorical_crossentropy, optimizer Adam(learning_rate), metrics ['accuracy']) return model tests.test_model_final(model_final) print('Final Model Loaded') Final Model Loaded Prediction def final_predictions(x, y, x_tk, y_tk): tmp_X pad(preproc_english_sentences) model model_final(tmp_X.shape, preproc_french_sentences.shape[1], len(english_tokenizer.word_index)+1, len(french_tokenizer.word_index)+1) model.fit(tmp_X, preproc_french_sentences, batch_size 1024, epochs 17, v alidation_split 0.2) y_id_to_word {value: key for key, value in y_tk.word_index.items} y_id_to_word[0] sentence 'he saw a old yellow truck' sentence [x_tk.word_index[word] for word in sentence.split] sentence pad_sequences([sentence], maxlenx.shape[-1], padding'post') sentences np.array([sentence[0], x[0]]) predictions model.predict(sentences, len(sentences)) print('Sample 1:') print(' '.join([y_id_to_word[np.argmax(x)] for x in predictions[0]])) print('Il a vuvieux camion jaune') print('Sample 2:') print(' '.join([y_id_to_word[np.argmax(x)] for x in predictions[1]])) print(' '.join([y_id_to_word[np.max(x)] for x in y[0]])) final_predictions(preproc_english_sentences, preproc_french_sentences, english_t okenizer, french_tokenizer) [1nPcxoagKHWGdlTMHBH-k5g.png] Figure 13 We are getting perfect translations on both sentences and 0.9776 validation accuracy score! Source code can be found at Github. I look forward to hearing feedback or questions. Machine Learning NLP Deep Learning Neural Networks Machine Translation 225 claps 6 BlockedUnblock FollowFollowing Go to the profile of Susan Li Susan Li Becoming an expert in ML, NLP, data story telling and encouraging others to do the same. Sr Data Scientist, Toronto Canada. https://www.linkedin.com/in/susanli/ Follow Towards Data Science Towards Data Science Sharing concepts, ideas, and codes. 225 Towards Data Science Never miss a story from Towards Data Science, when you sign up for Medium. Learn more Never miss a story from Towards Data Science Get updatesGet updates IFRAME: //www.googletagmanager.com/ns.html?idGTM-TRBQMN MIT Technology Review Hello, We noticed you're browsing in private or incognito mode. To continue reading this article, please exit incognito mode or log in. Not an Insider? Subscribe now for unlimited access to online articles. Subscribe today Why we made this change Visitors are allowed 3 free articles per month (without a subscription), and private browsing prevents us from counting how many stories you've read. We hope you understand, and consider subscribing for unlimited online access. Back to MIT Technology Review home Contact customer service if you are seeing this message in error. MIT Technology Review Menu Business Impact Human translators are still on top—for now Machine translation works well for sentences but turns out to falter at the document level, computational linguists have found. by Emerging Technology from the arXiv September 5, 2018 You may have missed the popping of champagne corks and the shower of ticker tape, but in recent months computational linguists have begun to claim that neural machine translation now matches the performance of human translators. Recommended for You 1. IBM has just unveiled this cool-looking quantum computer—but will hide it in the cloud 2. Hackers may have just stolen $1 million from the Ethereum Classic blockchain in a “51%” attack 3. The government shutdown has severely weakened cybersecurity in the US 4. The US and China are in a quantum arms race that will transform warfare 5. Data mining adds evidence that war is baked into the structure of society The technique of using a neural network to translate text from one language into another has improved by leaps and bounds in recent years, thanks to the ongoing breakthroughs in machine learning and artificial intelligence. So it is not really a surprise that machines have approached the performance of humans. Indeed, computational linguists have good evidence to back up this claim. But today, Samuel Laubli at the University of Zurich and a couple of colleagues say the champagne should go back on ice. They do not dispute their colleagues’ results but say the testing protocol fails to take account of the way humans read entire documents. When this is assessed, machines lag significantly behind humans, they say. [machine-translation.png?sw600&cx0&cy0&cw1420&ch53 5] At issue is how machine translation should be evaluated. This is currently done on two measures: adequacy and fluency. The adequacy of a translation is determined by professional human translators who read both the original text and the translation to see how well it expresses the meaning of the source. Fluency is judged by monolingual readers who see only the translation and determine how well it is expressed in English. Computational linguists agree that this system gives useful ratings. But according to Laubli and co, the current protocol only compares translations at the sentence level, whereas humans also evaluate text at the document level. So they have developed a new protocol to compare the performance of machine and human translators at the document level. They asked professional translators to assess how well machines and humans translated over 100 news articles written in Chinese into English. The examiners rated each translation for adequacy and fluency at the sentence level but, crucially also at the level of the entire document. The results make for interesting reading. To start with, Laubli and co found no significance difference in the way professional translators rated the adequacy of machine- and human-translated sentences. By this measure, humans and machines are equally good translators, which is in line with previous findings. However, when it comes to evaluating the entire document, human translations are rated as more adequate and more fluent than machine translations. “Human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences,” they say. The researchers think they know why. “We hypothesise that document-level evaluation unveils errors such as mistranslation of an ambiguous word, or errors related to textual cohesion and coherence, which remain hard or impossible to spot in a sentence-level evaluation,” they say. For example, the team gives the example of a new app called “微信挪 车,” which humans consistently translate as “WeChat Move the Car” but which machines often translate in several different ways in the same article. Machines translate this phrase as “ Move Car,” “WeChat mobile,” and “WeChat Move.” This kind of inconsistency, say Laubli and co, makes documents harder to follow. This suggests that the way machine translation is evaluated needs to evolve away from a system where machines consider each sentence in isolation. “As machine translation quality improves, translations will become harder to discriminate in terms of quality, and it may be time to shift towards document-level evaluation, which gives raters more context to understand the original text and its translation, and also exposes translation errors related to discourse phenomena which remain invisible in a sentence-level evaluation,” say Laubli and co. That change should machine translation improve. Which means it is still set to surpass human translation—just not yet. Ref: arxiv.org/abs/1808.07048 : Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation Learn from the humans leading the way in emerging technology at EmTech Next. Today! June 11-12, 2019 Cambridge, MA now Share Tagged emerging technology, arXiv Emerging Technology from the arXiv Emerging Technology from the arXiv Emerging Technology from the arXiv covers the latest ideas and technologies that appear on the Physics arXiv preprint server. It is part of the Physics arXiv Blog. Email:… More KentuckyFC@arxivblog.com Subscribe to the Physics arXiv Blog RSS Feed. Related Video More videos [Xb-logo-circle.png?sw75] Business Impact Finding the balance of human intelligence and artificial intelligence 00:53 [Xb-logo-circle.png?sw75] Business Impact How does the customer experience change when you're in a world of conversation? 00:39 [Xb-logo-circle.png?sw75] Business Impact Trump's Deputy CTO on immigrant workers 02:27 [Xb-logo-circle.png?sw75] Business Impact A View from the White House 23:50 Recommended for You 1. IBM has just unveiled this cool-looking quantum computer—but will hide it in the cloud 2. Hackers may have just stolen $1 million from the Ethereum Classic blockchain in a “51%” attack 3. The government shutdown has severely weakened cybersecurity in the US 4. The US and China are in a quantum arms race that will transform warfare 5. Data mining adds evidence that war is baked into the structure of society More from Business Impact How technology advances are changing the economy and providing new opportunities in many industries. The state of artificial intelligence AI technologies are coming into mainstream business usage—but a host of challenges remains. An interactive infographic illustrates the opportunities and the hurdles. by MIT Technology Review Insights Israel’s “startup nation” is under threat from the tech giants that nurtured it Global companies trying to tap into Tel Aviv’s unique innovation ecosystem are threatening to destroy the very thing they came for. by Matthew Kalman From innovation to monetization: The economics of data-driven transformation Used strategically, an organization’s data gains value over time. But to unlock its potential requires first establishing the right technical and cultural foundations. by MIT Technology Review Insights More from Business Impact From Our Advertisers In association with VMware Digital transformation sparks innovation in networking In association with Google For data-savvy marketers, there’s a new keyword: Intent In association with Google Machine learning teaches marketers to cultivate a growth mindset In association with Oracle and Intel Machine learning - driven analytics: Key to digital transformation Want more award-winning journalism? Subscribe to Insider Online Only. Insider Online Only {! insider.prices.online !} {! insider.display.menuOptionsLabel !} Unlimited online access including articles and video, plus The Download with the top tech stories delivered daily to your inbox. {! insider.buttons.online.buttonText !} See details+ Unlimited online access including all articles, multimedia, and more The Download newsletter with top tech stories delivered daily to your inbox {! insider.display.footerLabel !} See international prices See U.S. prices Revert to MIT Enterprise Forum pricing Revert to standard pricing Clocking In A look into how technology is shaping the workplace of the future By signing up you agree to receive email newsletters and notifications from MIT Technology Review. You can change your preferences at any time. View our Privacy Policy for more detail. Follow us RSS MIT Technology Review The mission of MIT Technology Review is to bring about better-informed and more conscious decisions about technology through authoritative, influential, and trustworthy journalism. /3 You've read of three free articles this month. Subscribe now for unlimited online access. You've read of three free articles this month. Subscribe now for unlimited online access. This is your last free article this month. Subscribe now for unlimited online access. You've read all your free articles this month. Subscribe now for unlimited online access. You've read of three free articles this month. Log in for more, or subscribe now for unlimited online access. Log in for two more free articles, or subscribe now for unlimited online access. next Iconic Translation Machines Ltd. » Feed Iconic Translation Machines Ltd. » Comments Feed 0% E-discovery Translation Enterprise Machine Translation Patent Translation Solutions E-discovery Translation Iconic solutions for e-discovery provide better quality, secure, robust, real-time translation tailored for your legal case allows you to streamline multilingual document review of ESI making the process faster, more effective, and more cost efficient. Learn More Enterprise Machine Translation Iconic delivers high-quality customised machine translation solutions for Enterprise users, adapted to your language, content, and style by our team of linguistic experts. It's MT with subject matter expertise. Get Started Patent Translation Solutions We created the world's first patent-specific machine translation engines. Our IPTranslator technology offers best-in-class machine translation performance for the translation of patent and related documents. Get Started The Enterprise Machine Translation System We develop Neural Machine Translation with Subject Matter Expertise, to the world's largest corporations, service providers, and government organisations to adopt specialist AI-powered translation solutions of superior quality, tailored with expertise. Neural Machine Translation Our proprietary Ensemble Architecture enables superior MT engines with a mix of neural, statistics, rules, and linguistic engineering techniques adapted to suit each content type and language. Enterprise Solutions We provide leading Enterprise MT solutions, developed by our expert team of MT PhDs and specialist engineers. We constantly innovate to deliver cutting edge MT software solutions and ensure that your quality requirements are exceeded. E-discovery Translation Our e-discovery translation software empowers you to search for and find the most relevant documents in your multilingual content, in your language at a moment’s notice. Translate vast amounts of foreign-language ESI quickly, securely, and effectively. Are you ready? Case Studies Our company is committed to providing reliable solutions in the long run. Read about our clients who have successfully adopted MT in their business and the benefits they saw to the bottom line. "Iconic delivered measurable productivity gains from the outset. Rarely have we seen the complexities and unforeseen but inevitable surprises of MT integration in large scale production processes handled as competently and efficiently.” Weloc_t next Iconic Translation Machines Ltd. » Feed Iconic Translation Machines Ltd. » Comments Feed [Iconic-Logo_rgb-257x110.png] Menu What We Do + Custom Solutions + Neural Machine Translation 0% E-discovery Translation Enterprise Machine Translation Patent Translation Solutions E-discovery Translation Iconic solutions for e-discovery provide better quality, secure, robust, real-time translation tailored for your legal case allows you to streamline multilingual document review of ESI making the process faster, more effective, and more cost efficient. Learn More Enterprise Machine Translation Iconic delivers high-quality customised machine translation solutions for Enterprise users, adapted to your language, content, and style by our team of linguistic experts. It's MT with subject matter expertise. Get Started Patent Translation Solutions We created the world's first patent-specific machine translation engines. Our IPTranslator technology offers best-in-class machine translation performance for the translation of patent and related documents. Get Started The Enterprise Machine Translation System We develop Neural Machine Translation with Subject Matter Expertise, to the world's largest corporations, service providers, and government organisations to adopt specialist AI-powered translation solutions of superior quality, tailored with expertise. Neural Machine Translation Our proprietary Ensemble Architecture enables superior MT engines with a mix of neural, statistics, rules, and linguistic engineering techniques adapted to suit each content type and language. Enterprise Solutions We provide leading Enterprise MT solutions, developed by our expert team of MT PhDs and specialist engineers. We constantly innovate to deliver cutting edge MT software solutions and ensure that your quality requirements are exceeded. E-discovery Translation Our e-discovery translation software empowers you to search for and find the most relevant documents in your multilingual content, in your language at a moment’s notice. Translate vast amounts of foreign-language ESI quickly, securely, and effectively. Are you ready? Case Studies Our company is committed to providing reliable solutions in the long run. Read about our clients who have successfully adopted MT in their business and the benefits they saw to the bottom line. "Iconic delivered measurable productivity gains from the outset. Rarely have we seen the complexities and unforeseen but inevitable surprises of MT integration in large scale production processes handled as competently and efficiently.” Weloc_t Co-authors: Angelika Clayton and Bing Zhao The need for economic opportunity is global, and that is represented by the fact that more than half of LinkedIn’s active members live outside of the U.S. Engagement across language barriers and borders comes with a certain set of challenges—one of which is providing a way for members to communicate in their native language. In fact, translation of member posts has been one of our most requested features, and now it's finally here. Dynamic (immediate) translations in the feed has been a tiger team effort from the get-go: a team of passionate localization evangelists and hungry engineers took on the challenge of realizing an opportunity that relied heavily on collaboration across different teams. We began with a small prototype to prove and test a concept, and ramped to a very small section of our membership. As the concept was proven successful, we used that experience to develop a more scalable solution to incorporate more languages. There are three central components that we had to incorporate: language detection, machine translation (MT), and feed experience. seetranslation1 Language detection and tagging We separated the processes of content language detection and actual translation to improve the member experience with international content in the feed. Separating the content language detection step from translation allowed us to build a base for a flexible, efficient dynamic language translation, to expand support for various content types, and to generate data for the use of relevance and analytics teams. Language detection is a near-real-time application processing high volumes of member-generated content data distributed across multiple Espresso stores. Instead of consuming directly from databases, we needed access to all the database changes without impacting the online queries. For this reason, we chose Brooklin, used at LinkedIn as a change data capture service, to stream change events from Espresso. Our language detection application consumes the change stream containing events for each write performed on the content databases. seetranslation2 To improve language detection quality, the data extracted by Samza jobs goes through filtering and cleansing (for example, mentions and hashtags are excluded from the language detection process). Filtered data is forwarded via the LinkedIn GaaP Service (Gateway-as-a-Service) to the Microsoft Text Analytics API, an Azure Cognitive Service that can detect up to 120 languages. The data is tagged with language detection results, i.e., locale ID and confidence score, and is available for processing by other applications.  In the content language detection and tagging process, we utilize multiple open source frameworks, services, and tools originally developed by LinkedIn, such as Kafka, Samza, and Rest.li. Feed experience The initial small-scale prototype on short-form member posts involved the implementation of a “See Translation” button whenever the language of the post, detected through a separate network call to the Microsoft Translator API (another Azure Cognitive Service), did not match the member’s interface language. When clicked, the button would display the text translated into the member’s interface language. The prototype was a proof of concept for internal ramping and a very limited external ramp, as a learning and evaluation exercise. The prototype was very successful in that member feedback was positive both in terms of the value of the feature itself and of the quality of the translated content. The prototype also allowed us to identify several areas that needed to be improved before we ramped to all members and all feed content: Locale detection: When the prototype was released, our service was making dual calls to Microsoft, one for language detection and one for translation, which was fine for a prototype, but too slow to scale the experience. It also meant that we did not retain the locale of unique content for statistical analysis. Locale comparison: This is a new logic that did not exist in the prototype. Now, we take the inferred locale set asynchronously by language detection and compare it with the member's interface locale. We no longer need to request this from Microsoft, as we were doing for the prototype, which significantly reduces the number of calls made. We now only render the “See translation” button if those locales are different, which makes for a much more intuitive member experience. Other content types: The prototype only worked on original posts, and the new model renders the functionality also on root shares, viral shares, and re-shares of organic updates. Our current design is split into two main flows: Translation Render and Translation Trigger. Translation Render flow: seetranslation3 Translation Trigger flow: seetranslation4 Polyglot-Online The Polyglot-Online mid-tier service uses GaaP to safely send encrypted text snippets to the Translator Text. An additional advantage in this framework is the ability to customize the translation models for a specific domain (like our feed) and integrate logic for filtering translation outputs based on system confidence scores. The API supports more than 60 languages in any translation direction, all of which we can leverage once the source language locale of a piece of content has been detected. For this feed feature, we selectively translate source text into 24 target languages, to match each member's interface locale supported by LinkedIn. This translation service also has features like logic for protecting entities such as hashtags and name mentions from being distorted in translation, and integrated filters to block irrelevant or unprofessional content, as well as advertisements, from being translated on the LinkedIn platform. We also use an in-memory encrypted cache to reduce latency, with its lightweight maintenance nature and better cost-to-serve than centralized solutions the Java Play framework at LinkedIn, the service easily supported multiple thousands of QPS during our prototype ramp. Acknowledgements Many thanks to Weizhi (Sam) Meng and Chang Liu for great coding and ownership, to David Snider for initiating the project, and to Annie Lin for writing GaaP scripts. We also want to thank Ian Fox for his work with Azure, Pradeepta Dash for engineering support for the feed, Atul Purohit for guidance with the feed API implementation, Jeremy Kao for guidance with web, Samish Kolli for client-side support, Nathan Hibner for his many contributions in tweaking the model, and Chao Zhang for the expert answers about overall backend functionality. Additionally, we want to recognize our ful friends at Microsoft: Ashish Makadia, Assaf Israel, and Brian Smith from the Text Analytics team, and Chris Wendt and Arul Menezes from the Translator team. Finally, a huge thank you to Francis Tsang and Tetyana Bruevich for their endless support. We hope our members enjoy this new feature! Topics content, machine translation Related story RSS [logo-nmt-500-alpha.png] An open source neural machine translation system. Home OpenNMT is an open source (MIT) initiative for neural machine translation and neural sequence modeling. [simple-attn.png] Since its launch in December 2016, OpenNMT has become a collection of implementations targeting both academia and industry. The systems are designed to be simple to use and easy to extend, while maintaining efficiency and state-of-the-art accuracy. OpenNMT has currently 3 main implementations: OpenNMT-lua (a.k.a. OpenNMT): the original project developed with LuaTorch. Full-featured, optimized, and stable code ready for quick experiments and production. OpenNMT-py: an OpenNMT-lua clone using the more modern PyTorch. Initially created by the AI research team as an example, this implementation is easier to extend and particularly suited for research. OpenNMT-tf: a TensorFlow alternative. The more recent project focusing on large scale experiments and high performance model serving using the latest TensorFlow features. All versions are currently maintained. Common features include: Simple general-purpose interface, requiring only source/target files. Highly configurable models and training procedures. Recent research features to improve system performance. Extensions to allow other sequence generation tasks such as summarization, image-to-text, or speech-recognition. Active community welcoming both academic and industrial requests and contributions. alternate alternate alternate alternate Students Appelpropositions: Séminaire interdisciplinaire CUSO Mémoires de traduction, gestion de projet, assurance qualité New Ba Offer : Online Courses for Arabic-French-English Pairs Why study at the FTI in Geneva online at the FTI Une collaboration fructueuse pourdeux parties Issue 30(2) of Parallèles Check out the FTI alumni map Archives Introduction to the FTI Structure of the Faculty Departments and Units Academic and Administrative Staff Student Associations Computer and Audiovisual Resources Employment opportunities Library Contacts FTI programmes BA MA Translation MA Interpretation MA in Multilingual Communication Technology (MATIM) Complementary Certificate Doctorate Continuing Education Research at the FTI Conferences and Lectures Publications Journal Parallèles PhD Theses PhD and career planning FTI’s international relations INcoming students OUTgoing students Contacts Admission to the FTI Why study at the FTI Academic Advisors Enrolling at the Faculty Entrance Examinations Contacts FAQs Departments and Units Navigation The Department of Translation + Staff + Course catalogue + Research + Arabic Unit + English Unit + French Unit + German Unit + Italian Unit + Spanish Unit The Department TIM + Members + Course Catalogue + Research + Projects + Publications + Contact The Interpreting Department + Staff + Education + Research + Resources + Outreach + Virtual Institute + TR@IN + LabTalk Sabrina Girletti Sabrina GIRLETTI Doctoral Assistant Phone: +41 22 37 98685 Office: 6339 - Uni Mail Sabrina.Girletti(at)unige.ch Teaching Activities Localisation Traduction automatique I Research Interests Localisation Machine Translation Post-editing (MT) CAT tools __________________________________________________________________ Summary Sabrina Girletti is a research and teaching assistant at the Translation Technology Department of the Faculty of Translation and Interpreting (FTI), where she contributes to postgraduate courses in machine translation and localisation. Her research interests include post-editing approaches and human factors in machine translation. She is currently involved in a project testing the implementation of machine translation at Swiss Post. Sabrina holds a master’s degree in Translation, with specialisation in Translation Technologies, from the University of Geneva, and a bachelor’s degree in Linguistic and cultural mediation from the University of Naples "L’Orientale". __________________________________________________________________ Publications Faculty of Translation and Interpreting (FTI) 40, boulevard du Pont-d'Arve 1211 GENEVE 4 - SUISSE Directions Contacts Enrolling at the Faculty Why study at the FTI in Geneva Academic Advisors How to Apply for Admission Quick Links Departments and Units Employment opportunities Library Faculty Intranet Follow us on social media alternate alternate IFRAME: https://www.googletagmanager.com/ns.html?idGTM-N9TJP3Q Home > Optimized Technology > Machine Translation (MT) Machine Translation (MT) Leverage a tailor-made machine translation engine based on your company’s unique data Human machine translation Customized Engines, All Private Rather than using error-prone free machine translation services, Venga leverages commercial machine translation (MT) engines. After factoring in your insights, we customize everything to reflect your company’s precise localization and budgetary needs. The end result is a private MT engine which is totally in tune with your content. Built in Days, Not Months In a business world where time is always of the essence, MT-handled localization projects can measurably cut costs and save time. This is especially important if rapid turnarounds are essential for your company’s success. To ensure speedier access to major overseas markets, our engineers can build you a private customized MT engine in just days. Light or Heavy Post-Editing With sufficient preparation and customization, our MT engines yield context-correct translations requiring only minimal post-editing. Depending on your needs, our language specialists can then conduct either light or heavy post-editing to ensure all translated documents are of consistently high quality. Data Refining Process Seamlessly integrating your glossaries and translation memories into your company’s private MT engine, our team will you to measurably improve content quality. By making it easier to identify issues and correct errors in your translation assets, they will also further enhance your MT engine’s accuracy and quality. Machine Translation Analytics Our customized MT engine will also provide your project with translation statistics documenting the percentages you have leveraged from your previously approved translated content as compared to human translations of the same text. In other words, our analytics will detail the evolution of your company’s MT engine. For more information, download our Machine Translation Service Description: Machine translation service description cover image Download now! Optimization Technology Optimized Technology Venga Gateway (TMS) Gateway Connect (CMS & API Integrations) WebToGlobal InView In-Context Tools Translation Assets Machine Translation (MT) [Machine Translation (MT)................] Plan your translation project eBook download Venga Locations Who we are Originated in the software industry, we use our twenty plus years’ experience globalizing information-based technology products to our clients succeed internationally. Venga offers translation, localization, and global creative services to enable clients in any industry to reach new markets faster. › HOME › Machine Translation Lucy LT - The Machine Translation Solution Some Lucy LT customers Lucy LT – For secure, cost-effective international communication If you want to communicate internationally in multiple languages increase revenue by reaching audiences in additional languages cut translation costs reduce translation turnaround times then check out what Lucy LT has to offer. Lucy LT is already ing these customers to communicate more efficiently in international markets. What can we do for you? Your benefits Lucy LT is secure Lucy LT supports multiple text formats Lucy LT offers a great number of language combinations Lucy LT integrates with TM systems such as SDL Trados Lucy LT can be embedded in end-to-end documentation processes with editors such as Adobe InDesign. Lucy LT is modular, adaptable and scalable Lucy LT is fast Lucy LT is efficient (small hardware footprint) Do you need a real-time translation for gisting purposes? Check out our online MT system KWIK Translator. Contact Information Lucy Software and Services GmbH Neidensteiner Str. 2 D-74915 Waibstadt Tel. +49 7263-40930-0 » info@lucysoftware.com » Full contact information Free online translator [Lucy-KWIK-Translator_web1.png] Who we are History and Mission Our People Our Partners Awards and Certification Jobs What we do News / Events Customers What Our Customers Say SAP Translation Our Services Consulting Full Service Translation Training Support Software Solutions Development Interpreting Intercultural Coaching Tips & Tricks Our Technical Experience Languages Machine Translation Key Features Languages Document Formats Integration Capabilities System Requirements Data Security Our Services Solutions & Use Cases KWIK Translator Documentation Services General Home Legal Notice Data Protection Contact Certification [logo_sap_partner.png] Language Consulting Partner Translation Partner Memberships [logo_dsag.png] [logo_gala.png] [logo_tekom.png] [logo_etug.png] [logo_elia.png] [logo_eamt.png] GDPR United Language Group is committed to protecting your personal data and updating our privacy policies in accordance with the European Union’s General Data Protection Regulation (GDPR). We use cookies to analyze our website traffic to provide a better user experience. Accept Full Privacy Policy X